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1  Rates  of  Change 


1.1  Change  in 
discrete  steps 

Toward  the  end  of  the  eighteenth 
century,  a  German  elementary 
school  teacher  decided  to  keep  his 
pupils  busy  by  assigning  them  a 
long,  boring  arithmetic  problem: 
to  add  up  all  the  numbers  from 
one  to  a  hundred.1  The  chil¬ 
dren  set  to  work  on  their  slates, 
and  the  teacher  lit  his  pipe,  con¬ 
fident  of  a  long  break.  But  al¬ 
most  immediately,  a  boy  named 
Carl  Friedrich  Gauss  brought  up 
his  answer:  5,050. 


a  /  Adding  the  numbers 
from  1  to  7. 


Figure  a  suggests  one  way  of  solv¬ 
ing  this  type  of  problem.  The 
filled-in  columns  of  the  graph  rep¬ 
resent  the  numbers  from  1  to  7, 
and  adding  them  up  means  frncl- 

1  Pm  giving  my  own  retelling  of  a 
hoary  legend.  We  don’t  really  know  the 
exact  problem,  just  that  it  was  supposed 
to  have  been  something  of  this  flavor. 


]_ 

2 

49 

2 


b  /  A  trick  for  finding  the 
sum. 

ing  the  area  of  the  shaded  region. 
Roughly  half  the  square  is  shaded 
in,  so  if  we  want  only  an  approxi¬ 
mate  solution,  we  can  simply  cal¬ 
culate  72/2  =  24.5. 

But,  as  suggested  in  figure  b,  it’s 
not  much  more  work  to  get  an  ex¬ 
act  result.  There  are  seven  saw- 
teeth  sticking  out  out  above  the  di¬ 
agonal,  with  a  total  area  of  7/2, 
so  the  total  shaded  area  is  (72  + 
7)/2  =  28.  In  general,  the  sum  of 
the  first  n  numbers  will  be  (n2  + 
n)/ 2,  which  explains  Gauss’s  re¬ 
sult:  (1002  +  100)/2  =  5,050. 

Two  sides  of  the  same  coin 

Problems  like  this  come  up  fre¬ 
quently.  Imagine  that  each  house¬ 
hold  in  a  certain  small  town  sends 
a  total  of  one  ton  of  garbage  to  the 
dump  every  year.  Over  time,  the 
garbage  accumulates  in  the  dump, 
taking  up  more  and  more  space. 
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c  /  Carl  Friedrich  Gauss 
(1777-1855),  a  long  time 
after  graduating  from  ele¬ 
mentary  school. 

Let’s  label  the  years  as  n  =  1,  2, 
3,...,  and  let  the  function2  x (n) 
represent  the  amount  of  garbage 
that  has  accumulated  by  the  end 
of  year  n.  If  the  population  is 
constant,  say  13  households,  then 
garbage  accumulates  at  a  constant 
rate,  and  we  have  x (n)  =  13n. 

But  maybe  the  town’s  population 
is  growing.  If  the  population  starts 
out  as  1  household  in  year  1,  and 
then  grows  to  2  in  year  2,  and  so 
on,  then  we  have  the  same  kind 
of  problem  that  the  young  Gauss 
solved.  After  100  years,  the  accu¬ 
mulated  amount  of  garbage  will  be 
5,050  tons.  The  pile  of  refuse  grows 
more  quickly  every  year;  the  rate  of 
change  of  x  is  not  constant.  Tabu¬ 
lating  the  examples  we’ve  done  so 
far,  we  have  this: 

2  Recall  that  when  x  is  a  function,  the 
notation  x(n)  means  the  output  of  the 
function  when  the  input  is  n.  It  doesn’t 
represent  multiplication  of  a  number  x  by 
a  number  n. 


rate  of  change  accumulated 
result 

13  13n 

n  (n2  An) 1 2 

The  rate  of  change  of  the  function 
x  can  be  notated  as  x.  Given  the 
function  i,  we  can  always  deter¬ 
mine  the  function  x  for  any  value 
of  n  by  doing  a  running  sum. 

Likewise,  if  we  know  x,  we  can  de¬ 
termine  x  by  subtraction.  In  the 
example  where  x  =  13n,  we  can 
find  x  =  x(n )  —  x(n  —  1)  =  13n  — 
13(n  —  1)  =  13.  Or  if  we  knew 
that  the  accumulated  amount  of 
garbage  was  given  by  (n2  A  n)/ 2, 
we  could  calculate  the  town’s  pop¬ 
ulation  like  this: 

n2+n  [n  —  l)2  +  (n  —  1) 

2  2 
n2  An  —  ( n 2  —  2n  A  1  +  n  —  l) 
2 

=  n 


x 


d  /  x  is  the  slope  of  x. 

The  graphical  interpretation  of 
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this  is  shown  in  figure  d:  on  a 
graph  of  x  =  ( n 2  +  n)/ 2,  the  slope 
of  the  line  connecting  two  succes¬ 
sive  points  is  the  value  of  the  func¬ 
tion  x. 

In  other  words,  the  functions  x  and 
x  are  like  different  sides  of  the  same 
coin.  If  you  know  one,  you  can  find 
the  other  -  with  two  caveats. 

First,  we’ve  been  assuming  im¬ 
plicitly  that  the  function  x  starts 
out  at  x(0)  =  0.  That  might 
not  be  true  in  general.  For  in¬ 
stance,  if  we’re  adding  water  to  a 
reservoir  over  a  certain  period  of 
time,  the  reservoir  probably  didn’t 
start  out  completely  empty.  Thus, 
if  we  know  x,  we  can’t  find  out 
everything  about  x  without  some 
further  information:  the  starting 
value  of  x.  If  someone  tells  you 
x  =  13,  you  can’t  conclude  x  = 
13n,  but  only  x  =  13rc  +  c,  where  c 
is  some  constant.  There’s  no  such 
ambiguity  if  you’re  going  the  op¬ 
posite  way,  from  x  to  x.  Even 
if  x(0)  ^  0,  we  still  have  x  = 
13  n  +  c  —  [13(n  —  1)  +  c]  =  13. 

Second,  it  may  be  difficult,  or  even 
impossible,  to  find  a  formula  for 
the  answer  when  we  want  to  de¬ 
termine  the  running  sum  x  given 
a  formula  for  the  rate  of  change  x. 
Gauss  had  a  flash  of  insight  that 
led  him  to  the  result  (n2  +  n)/ 2, 
but  in  general  we  might  only  be 
able  to  use  a  computer  spreadsheet 
to  calculate  a  number  for  the  run¬ 
ning  sum,  rather  than  an  equation 
that  would  be  valid  for  all  values 


of  n. 

Some  guesses 

Even  though  we  lack  Gauss’s  ge¬ 
nius,  we  can  recognize  certain  pat¬ 
terns.  One  pattern  is  that  if  x  is  a 
function  that  gets  bigger  and  big¬ 
ger,  it  seems  like  x  will  be  a  func¬ 
tion  that  grows  even  faster  than 
x.  In  the  example  of  x  =  n  and 
x  =  (n2  +  n)/ 2,  consider  what  hap¬ 
pens  for  a  large  value  of  n,  like 
100.  At  this  value  of  n.  x  =  100, 
which  is  pretty  big,  but  even  with¬ 
out  pawing  around  for  a  calculator, 
we  know  that  x  is  going  to  turn  out 
really  really  big.  Since  n  is  large, 
n2  is  quite  a  bit  bigger  than  n,  so 
roughly  speaking,  we  can  approxi¬ 
mate  x  «  n2 / 2  =  5,  000.  100  may 
be  a  big  number,  but  5,000  is  a  lot 
bigger.  Continuing  in  this  way,  for 
n  =  1000  we  have  x  =  1000,  but 
x  ~  500, 000  —  now  x  has  far  out¬ 
stripped  x.  This  can  be  a  fun  game 
to  play  with  a  calculator:  look  at 
which  functions  grow  the  fastest. 
For  instance,  your  calculator  might 
have  an  x2  button,  an  ex  button, 
and  a  button  for  x\  (the  factorial 
function,  defined  as  xl  =  1-2- . .  ,-x, 
e.g.,  4!  =  1  •  2  •  3  -  4  =  24).  You’ll 
find  that  502  is  pretty  big,  but  e50 
is  incomparably  greater,  and  50!  is 
so  big  that  it  causes  an  error. 

All  the  x  and  x  functions  we’ve 
seen  so  far  have  been  polynomials. 
If  x  is  a  polynomial,  then  of  course 
we  can  find  a  polynomial  for  x  as 
well,  because  if  x  is  a  polynomial, 
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then  x(ri)—x(n—  1)  will  be  one  too. 
It  also  looks  like  every  polynomial 
we  could  choose  for  x  might  also 
correspond  to  an  x  that’s  a  poly¬ 
nomial.  And  not  only  that,  but  it 
looks  as  though  there’s  a  pattern 
in  the  power  of  n.  Suppose  £  is  a 
polynomial,  and  the  highest  power 
of  n  it  contains  is  a  certain  num¬ 
ber  —  the  “order”  of  the  polyno¬ 
mial.  Then  x  is  a  polynomial  of 
that  order  minus  one.  Again,  it’s 
fairly  easy  to  prove  this  going  one 
way,  passing  from  a;  to  a;,  but  more 
difficult  to  prove  the  opposite  rela¬ 
tionship:  that  if  x  is  a  polynomial 
of  a  certain  order,  then  x  must  be 
a  polynomial  with  an  order  that’s 
greater  by  one. 

We’d  imagine,  then,  that  the  run¬ 
ning  sum  of  x  =  n2  would  be  a 
polynomial  of  order  3.  If  we  cal¬ 
culate  :r(100)  =  l2  +  22  +  . . .  + 
1002  on  a  computer  spreadsheet, 
we  get  338,350,  which  looks  sus¬ 
piciously  close  to  1,000,000/3.  It 
looks  like  x(n)  =  n3  / 3  +  . . .,  where 
the  dots  represent  terms  involving 
lower  powers  of  n  such  as  n2.  The 
fact  that  the  coefficient  of  the  n3 
term  is  1/3  is  proved  in  problem 
21  on  p.  23. 

Example  1 

Figure  e  shows  a  pyramid  consisting 
of  a  single  cubical  block  on  top,  sup¬ 
ported  by  a  2  x  2  layer,  supported  in 
turn  by  a  3  x  3  layer.  The  total  volume 
is  12  +  22  +  32,  in  units  of  the  volume  of 
a  single  block. 

Generalizing  to  the  sum  x(n)  =  I2  + 


e  /  A  pyramid  with  a  vol¬ 
ume  of  12  +22  +  32. 


22  + . . .  +  n2 ,  and  applying  the  result  of 
the  preceding  paragraph,  we  find  that 
the  volume  of  such  a  pyramid  is  ap¬ 
proximately  (1  /3)Ah,  where  A  =  n2  is 
the  area  of  the  base  and  h  =  n  is  the 
height. 

When  n  is  very  large,  we  can  get  as 
good  an  approximation  as  we  like  to 
a  smooth-sided  pyramid,  and  the  er¬ 
ror  incurred  in  x(n)  «  (1  /3 )n3  +  ...  by 
omitting  the  lower-order  terms  . . .  can 
be  made  as  small  as  desired. 

We  therefore  conclude  that  the  vol¬ 
ume  is  exactly  (1  /3 )Ah  for  a  smooth¬ 
sided  pyramid  with  these  proportions. 

This  is  a  special  case  of  a  theorem 
first  proved  by  Euclid  (propositions 
XI 1-6  and  XI 1-7)  two  thousand  years 
before  calculus  was  invented. 

1.2  Continuous 
change 

Did  you  notice  that  I  sneaked 
something  past  you  in  the  example 
of  water  filling  up  a  reservoir?  The 
x  and  x  functions  I’ve  been  using 
as  examples  have  all  been  functions 
defined  on  the  integers,  so  they 
represent  change  that  happens  in 
discrete  steps,  but  the  flow  of  water 
into  a  reservoir  is  smooth  and  con- 
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f/ Isaac  Newton  (1643- 
1727) 

tinuous.  Or  is  it?  Water  is  made 
out  of  molecules,  after  all.  It’s  just 
that  water  molecules  are  so  small 
that  we  don’t  notice  them  as  indi¬ 
viduals.  Figure  g  shows  a  graph 
that  is  discrete,  but  almost  ap¬ 
pears  continuous  because  the  scale 
has  been  chosen  so  that  the  points 
blend  together  visually. 


alyzing  x  and  x  functions  that  were 
truly  continuous.  The  notation  x 
is  due  to  him  (and  he  only  used  it 
for  continuous  functions).  Because 
he  was  dealing  with  the  continuous 
flow  of  change,  he  called  his  new 
set  of  mathematical  techniques  the 
method  of  fluxions ,  but  nowadays 
it’s  known  as  the  calculus. 


x 


h  /  The  function  x(f)  = 
f2/ 2,  and  its  tangent  line 
at  the  point  (1, 1  /2). 


x 


g  /  On  this  scale,  the 
graph  of  (n2  +  ri)/ 2  ap¬ 
pears  almost  continuous. 


The  physicist  Isaac  Newton  started 
thinking  along  these  lines  in  the 
1660’s,  and  figured  out  ways  of  an- 


Newton  was  a  physicist,  and  he 
needed  to  invent  the  calculus  as 
part  of  his  study  of  how  objects 
move.  If  an  object  is  moving  in 
one  dimension,  we  can  specify  its 
position  with  a  variable  x,  and  x 
will  then  be  a  function  of  time,  t. 
The  rate  of  change  of  its  position, 
x,  is  its  speed,  or  velocity.  Ear¬ 
lier  experiments  by  Galileo  had  es¬ 
tablished  that  when  a  ball  rolled 
down  a  slope,  its  position  was  pro¬ 
portional  to  t2,  so  Newton  inferred 
that  a  graph  like  figure  h  would 
be  typical  for  any  object  moving 
under  the  influence  of  a  constant 
force.  (It  could  be  7f2,  or  f2/42, 
or  anything  else  proportional  to  f2, 
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x 


i  /  This  line  isn’t  a  tangent 
line:  it  crosses  the  graph. 


depending  on  the  force  acting  on 
the  object  and  the  object’s  mass.) 

Because  the  functions  are  continu¬ 
ous,  not  discrete,  we  can  no  longer 
define  the  relationship  between  x 
and  x  by  saying  a:  is  a  running  sum 
of  i’s,  or  that  x  is  the  difference  be¬ 
tween  two  successive  x's.  But  we 
already  found  a  geometrical  rela¬ 
tionship  between  the  two  functions 
in  the  discrete  case,  and  that  can 
serve  as  our  definition  for  the  con¬ 
tinuous  case:  x  is  the  area  under 
the  graph  of  x,  or,  if  you  like,  x  is 
the  slope  of  the  graph  of  x.  For 
now  we’ll  concentrate  on  the  slope 
idea. 

This  definition  is  still  a  little  vague, 
because  we  haven’t  defined  what 
we  mean  by  the  “slope”  of  a  curv¬ 
ing  graph.  For  a  discrete  graph 
like  figure  d,  we  could  define  it  as 
the  slope  of  the  line  drawn  between 
neighboring  points.  Visually,  it’s 
clear  that  the  continuous  version 
of  this  is  something  like  the  line 
drawn  in  figure  h.  This  is  referred 
to  as  the  tangent  line. 


We  still  need  to  convert  this  in¬ 
tuitive  idea  of  a  tangent  line  into 
a  formal  definition.  In  a  typi¬ 
cal  example  like  figure  h,  the  tan¬ 
gent  line  can  be  defined  as  the  line 
that  touches  the  graph  at  a  certain 
point,  but,  unlike  the  line  in  fig¬ 
ure  i,  doesn’t  cut  across  the  graph 
at  that  point.3  By  measuring  with 
a  ruler  on  figure  h,  we  find  that 
the  slope  is  very  close  to  1,  so  evi¬ 
dently  i(l)  =  1.  To  prove  this,  we 
construct  the  function  representing 
the  line:  £{t)  =  t  —  1/2.  We  want 
to  prove  that  this  line  doesn’t  cross 
the  graph  of  x(t)  =  t1  / 2.  The  dif¬ 
ference  between  the  two  functions, 
x  —  £,  is  the  polynomial  t2  / 2  —  t  + 
1/2,  and  this  polynomial  will  be 
zero  for  any  value  of  t  where  the 
line  touches  or  crosses  the  curve. 
We  can  use  the  quadratic  formula 
to  find  these  points,  and  the  result 
is  that  there  is  only  one  of  them, 
which  is  t  =  1.  Since  x  —  £  is  posi¬ 
tive  for  at  least  some  points  to  the 
left  and  right  of  t  =  1,  and  it  only 
equals  zero  at  t  =  1,  it  must  never 
be  negative,  which  means  that  the 
line  always  lies  below  the  curve, 
never  crossing  it. 


3In  the  case  where  the  original  graph 
is  itself  a  line,  the  tangent  line  simply  co¬ 
incides  with  the  graph,  and  this  also  sat¬ 
isfies  the  definition,  because  the  tangent 
line  doesn’t  cut  across  the  graph;  it  lies 
on  top  of  it.  There  is  one  other  excep¬ 
tional  case,  called  a  point  of  inflection, 
which  we  won’t  worry  about  right  now. 
For  a  more  complicated  definition  that 
correctly  handles  all  the  cases,  see  page 
139. 
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A  derivative 


That  proves  that  x(l)  =  1,  but  it 
was  a  lot  of  work,  and  we  don’t 
want  to  do  that  much  work  to  eval¬ 
uate  x  at  every  value  of  t.  There’s 
a  way  to  avoid  all  that,  and  find  a 
formula  for  x.  Compare  figures  h 
and  j.  They’re  both  graphs  of  the 
same  function,  and  they  both  look 
the  same.  What’s  different?  The 
only  difference  is  the  scales:  in  fig¬ 
ure  j,  the  t  axis  has  been  shrunk 
by  a  factor  of  2,  and  the  x  axis  by 
a  factor  of  4.  The  graph  looks  the 
same,  because  doubling  t  quadru¬ 
ples  t2  / 2.  The  tangent  line  here 
is  the  tangent  line  at  t  =  2,  not 
t  =  1,  and  although  it  looks  like 
the  same  line  as  the  one  in  figure 
h,  it  isn’t,  because  the  scales  are 
different.  The  line  in  figure  h  had 
a  slope  of  rise/run  =  1/1  =  1, 
but  this  one’s  slope  is  4/2  =  2. 
That  means  x(2)  =  2.  In  general, 
this  scaling  argument  shows  that 
x(t)  =  t  for  any  t. 


x 


j  /  The  function  f2/ 2 
again.  How  is  this 
different  from  figure  h? 


This  is  called  differentiating:  find¬ 
ing  a  formula  for  the  function  x, 
given  a  formula  for  the  function 
x.  The  term  comes  from  the  idea 
that  for  a  discrete  function,  the 
slope  is  the  difference  between  two 
successive  values  of  the  function. 
The  function  x  is  referred  to  as  the 
derivative  of  the  function  x ,  and 
the  art  of  differentiating  is  differ¬ 
ential  calculus.  The  opposite  pro¬ 
cess,  computing  a  formula  for  x 
when  given  x,  is  called  integrating, 
and  makes  up  the  held  of  integral 
calculus;  this  terminology  is  based 
on  the  idea  that  computing  a  run¬ 
ning  sum  is  like  putting  together 
(integrating)  many  little  pieces. 

Note  the  similarity  between  this  re¬ 
sult  for  continuous  functions, 

x  =  t2 1 2  x  =  t, 

and  our  earlier  result  for  discrete 
ones, 

x  =  ( n 2  +  n)/2  x  =  n. 

The  similarity  is  no  coincidence. 
A  continuous  function  is  just  a 
smoothed-out  version  of  a  discrete 
one.  For  instance,  the  continuous 
version  of  the  staircase  function 
shown  in  figure  b  on  page  7  would 
simply  be  a  triangle  without  the 
saw  teeth  sticking  out;  the  area  of 
those  ugly  sawteeth  is  what’s  rep¬ 
resented  by  the  n/2  term  in  the  dis¬ 
crete  result  x  =  ( n 2  +  n)/2,  which 
is  the  only  thing  that  makes  it  dif¬ 
ferent  from  the  continuous  result 
x  =  t2  /  2. 
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Properties  of  the  derivative 

It  follows  immediately  from  the 
definition  of  the  derivative  that 
multiplying  a  function  by  a  con¬ 
stant  multiplies  its  derivative  by 
the  same  constant,  so  for  example 
since  we  know  that  the  derivative 
of  t2 / 2  is  t,  we  can  immediately  tell 
that  the  derivative  of  t 2  is  2 1,  and 
the  derivative  of  t2/17  is  2t/17. 

Also,  if  we  add  two  functions,  their 
derivatives  add.  To  give  a  good 
example  of  this,  we  need  to  have 
another  function  that  we  can  dif¬ 
ferentiate,  one  that  isn’t  just  some 
multiple  of  t1 .  An  easy  one  is  t:  the 
derivative  of  t  is  1,  since  the  graph 
of  x  =  t  is  a  line  with  a  slope  of  1, 
and  the  tangent  line  lies  right  on 
top  of  the  original  line. 

Example  2 

The  derivative  of  5f2+2f  is  the  deriva¬ 
tive  of  5 f2  plus  the  derivative  of  2 f, 
since  derivatives  add.  The  derivative 
of  5 f2  is  5  times  the  derivative  of  f2, 
and  the  derivative  of  2 f  is  2  times  the 
derivative  of  f,  so  putting  everything 
together,  we  find  that  the  derivative  of 
5f2  +  2f  is  (5)(2f)  +  (2)(1)  =  10f  +  2. 

The  derivative  of  a  constant  is 
zero,  since  a  constant  function’s 
graph  is  a  horizontal  line,  with 
a  slope  of  zero.  We  now  know 
enough  to  differentiate  any  second- 
order  polynomial. 

Example  3 

>  An  insect  pest  from  the  United 
States  is  inadvertently  released  in  a 
village  in  rural  China.  The  pests 


spread  outward  at  a  rate  of  s  kilome¬ 
ters  per  year,  forming  a  widening  cir¬ 
cle  of  contagion.  Find  the  number  of 
square  kilometers  per  year  that  be¬ 
come  newly  infested.  Check  that  the 
units  of  the  result  make  sense.  Inter¬ 
pret  the  result. 

>  Let  t  be  the  time,  in  years,  since 
the  pest  was  introduced.  The  radius 
of  the  circle  is  r  =  st ,  and  its  area  is 
a  =  nr2  =  n(st)2.  To  make  this  look 
like  a  polynomial,  we  have  to  rewrite  it 
as  a  =  (ns2)t2.  The  derivative  is 

a=(7ts2)(2f) 

a  =  (2nsz)t 

The  units  of  s  are  km/year,  so  squar¬ 
ing  it  gives  km2/year2.  The  2  and  the 
7t  are  unitless,  and  multiplying  by  t 
gives  units  of  km2/year,  which  is  what 
we  expect  for  a,  since  it  represents  the 
number  of  square  kilometers  per  year 
that  become  infested. 

Interpreting  the  result,  we  notice  a 
couple  of  things.  First,  the  rate  of 
infestation  isn’t  constant;  it’s  propor¬ 
tional  to  f,  so  people  might  not  pay 
so  much  attention  at  first,  but  later  on 
the  effort  required  to  combat  the  prob¬ 
lem  will  grow  more  and  more  quickly. 
Second,  we  notice  that  the  result  is 
proportional  to  s2.  This  suggests  that 
anything  that  could  be  done  to  reduce 
s  would  be  very  helpful.  For  instance, 
a  measure  that  cut  s  in  half  would  re¬ 
duce  a  by  a  factor  of  four. 

Higher-order  polynomials 

So  far,  we  have  the  following  re¬ 
sults  for  polynomials  up  to  order 
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2: 

function  derivative 

1  0 

t  1 

t2  2 1 

Interpreting  1  as  t,  we  detect  what 
seems  to  be  a  general  rule,  which 
is  that  the  derivative  of  tk  is 
The  proof  is  straightforward  but 
not  very  illuminating  if  carried  out 
with  the  methods  developed  in  this 
chapter,  so  I’ve  relegated  it  to  page 
140.  It  can  be  proved  much  more 
easily  using  the  methods  of  chapter 
2. 

Example  4 

>  If  x  =  2f7  —  4t  +  1 ,  find  x. 

t>  This  is  similar  to  example  2,  the  only 
difference  being  that  we  can  now  han¬ 
dle  higher  powers  of  t.  The  derivative 
of  t'  is  7 f6,  so  we  have 

*  -  (2)(7f6)  +  (— 4)(1 )  +  0 
=  14f6  —  4 


Example  5 

>  Calculate  3_1  and  3.01 -1.  Does 
this  seem  consistent  with  a  conjecture 
that  the  rule  for  differentiating  tk  holds 
for  k  <  0? 

>  We  have  3_1  «  0.33333  and 

3.01 «  0.332223,  the  difference  be¬ 
ing  -1.1  x  1CT3.  This  suggests  that 
the  graph  of  x  =  1  /t  has  a  tangent  line 
at  t  =  3  with  a  slope  of  about 


If  the  rule  for  differentiating  tk  were  to 
hold,  then  we  would  have  x  =  -U2, 


and  evaluating  this  at  x  =  3  would  give 
—1/9,  which  is  indeed  about  —0.11. 
Yes,  the  rule  does  appear  to  hold  for 
negative  k,  although  this  numerical 
check  does  not  constitute  a  proof.  A 
proof  is  given  in  example  1 0  on  p.  27. 

The  second  derivative 

I  described  how  Galileo  and  New¬ 
ton  found  that  an  object  subject 
to  an  external  force,  starting  from 
rest,  would  have  a  velocity  x  that 
was  proportional  to  t,  and  a  posi¬ 
tion  x  that  varied  like  t 2 .  The  pro¬ 
portionality  constant  for  the  veloc¬ 
ity  is  called  the  acceleration,  a,  so 
that  x  =  at  and  x  =  at2 /2.  For 
example,  a  sports  car  accelerating 
from  a  stop  sign  would  have  a  large 
acceleration,  and  its  velocity  at  at 
a  given  time  would  therefore  be 
a  large  number.  The  acceleration 
can  be  thought  of  as  the  deriva¬ 
tive  of  the  derivative  of  x,  writ¬ 
ten  x,  with  two  dots.  In  our  ex¬ 
ample,  x  —  a.  In  general,  the  ac¬ 
celeration  doesn’t  need  to  be  con¬ 
stant.  For  example,  the  sports  car 
will  eventually  have  to  stop  accel¬ 
erating,  perhaps  because  the  back¬ 
ward  force  of  air  friction  becomes 
as  great  as  the  force  pushing  it  for¬ 
ward.  The  total  force  acting  on  the 
car  would  then  be  zero,  and  the  car 
would  continue  in  motion  at  a  con¬ 
stant  speed. 

Example  6 

Suppose  the  pilot  of  a  blimp  has  just 
turned  on  the  motor  that  runs  its  pro¬ 
peller,  and  the  propeller  is  spinning 
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up.  The  resulting  force  on  the  blimp 
is  therefore  increasing  steadily,  and 
let's  say  that  this  causes  the  blimp  to 
have  an  acceleration  x  =  3 f,  which  in¬ 
creases  steadily  with  time.  We  want 
to  find  the  blimp’s  velocity  and  position 
as  functions  of  time. 

For  the  velocity,  we  need  a  polynomial 
whose  derivative  is  3 1.  We  know  that 
the  derivative  of  f 2  is  2 f,  so  we  need  to 
use  a  function  that’s  bigger  by  a  factor 
of  3/2:  x  =  (3/2 )f2.  In  fact,  we  could 
add  any  constant  to  this,  and  make  it 
x  =  (3/2 )f2  +  14,  for  example,  where 
the  14  would  represent  the  blimp’s 
initial  velocity.  But  since  the  blimp 
has  been  sitting  dead  in  the  air  un¬ 
til  the  motor  started  working,  we  can 
assume  the  initial  velocity  was  zero. 
Remember,  any  time  you’re  working 
backwards  like  this  to  find  a  function 
whose  derivative  is  some  other  func¬ 
tion  (integrating,  in  other  words),  there 
is  the  possibility  of  adding  on  a  con¬ 
stant  like  this. 

Finally,  for  the  position,  we  need 
something  whose  derivative  is  (3/2 )f2. 
The  derivative  of  f3  would  be  3 f2,  so 
we  need  something  half  as  big  as  this: 
x  =  t3/ 2. 

The  second  derivative  can  be  in¬ 
terpreted  as  a  measure  of  the  cur¬ 
vature  of  the  graph,  as  shown  in 
figure  k.  The  graph  of  the  function 
x  =  2t  is  a  line,  with  no  curvature. 
Its  first  derivative  is  2,  and  its  sec¬ 
ond  derivative  is  zero.  The  func¬ 
tion  t 2  has  a  second  derivative  of  2, 
and  the  more  tightly  curved  func¬ 
tion  7t2  has  a  bigger  second  deriva¬ 
tive,  14. 


x 


k  /  The  functions  2 f,  t2 
and  7  f2. 


x 


I  /  The  functions  f2  and 
3-  f2. 

Positive  and  negative  signs  of  the 
second  derivative  indicate  concav¬ 
ity.  In  figure  1,  the  function  t2  is 
like  a  cup  with  its  mouth  pointing 
up.  We  say  that  it’s  “concave  up,” 
and  this  corresponds  to  its  posi¬ 
tive  second  derivative.  The  func¬ 
tion  3— f2,  with  a  second  derivative 
less  than  zero,  is  concave  down. 
Another  way  of  saying  it  is  that  if 
you’re  driving  along  a  road  shaped 
like  t  ,  going  in  the  direction  of  in¬ 
creasing  t,  then  your  steering  wheel 
is  turned  to  the  left,  whereas  on  a 
road  shaped  like  3  —  t2  it’s  turned 
to  the  right. 
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m  /  The  functions  t3  has 
an  inflection  point  at  t  = 

0. 

Figure  m  shows  a  third  possibil¬ 
ity.  The  function  t3  has  a  deriva¬ 
tive  3f2  and  a  second  derivative  6t, 
which  equals  zero  at  t  =  0.  This 
is  called  a  point  of  inflection.  The 
concavity  of  the  graph  is  down  on 
the  left  side,  up  on  the  right.  The 
inflection  point  is  where  it  switches 
from  one  concavity  to  the  other.  In 
the  alternative  description  in  terms 
of  the  steering  wheel,  the  inflection 
point  is  where  your  steering  wheel 
is  crossing  from  left  to  right. 

1.3  Applications 

Maxima  and  minima 

When  a  function  goes  up  and  then 
smoothly  turns  around  and  comes 
back  down  again,  it  has  zero  slope 
at  the  top.  A  place  where  x  =  0, 
then,  could  represent  a  place  where 
x  was  at  a  maximum.  On  the  other 
hand,  it  could  be  concave  up,  in 
which  case  we’d  have  a  minimum. 
The  term  extremum  refers  to  ei¬ 
ther  a  maximum  or  a  minimum. 

Example  7 


>  Fred  receives  a  mysterious  e-mail  tip 
telling  him  that  his  investment  in  a  cer¬ 
tain  stock  will  have  a  value  given  by 
x  =  — 2f4  +  (6.4577  x  10 10)f,  where 
t  >  2005  is  the  year.  Should  he  sell  at 
some  point?  If  so,  when? 

>  If  the  value  reaches  a  maximum  at 
some  time,  then  the  derivative  should 
be  zero  then.  Taking  the  derivative 
and  setting  it  equal  to  zero,  we  have 

0  =  -8f3  +  6.4577  x  1010 

6.4577  x  1010  \  1/3 
8  ) 
t  =  ±2006.0. 

Obviously  the  solution  at  t  =  -2006.0 
is  bogus,  since  the  stock  market  didn't 
exist  four  thousand  years  ago,  and  the 
tip  only  claimed  the  function  would  be 
valid  for  t  >  2005. 

Should  Fred  sell  on  New  Year’s  eve  of 
2006? 

But  this  could  be  a  maximum,  a  mini¬ 
mum,  or  an  inflection  point.  Fred  defi¬ 
nitely  does  not  want  to  sell  at  t  =  2006 
if  it’s  a  minimum!  To  check  which  of 
the  three  possibilities  hold,  Fred  takes 
the  second  derivative: 

x  =  —24 12. 

Plugging  in  t  =  2006.0,  we  find  that 
the  second  derivative  is  negative  at 
that  time,  so  it  is  indeed  a  maximum. 

Implicit  in  this  whole  discussion 
was  the  assumption  that  the  max¬ 
imum  or  minimum  occurred  where 
the  function  was  smooth.  There 
are  some  other  possibilities. 

In  figure  n,  the  function’s  mini¬ 
mum  occurs  at  an  end-point  of  its 
domain. 
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x 


n  /  The  function  x  =  Cl 
has  a  minimum  at  t  = 

0,  which  is  not  a  place 
where  x  =  0.  This  point  is 
the  edge  of  the  function’s 
domain. 

Another  possibility  is  that  the 
function  can  have  a  minimum  or 
maximum  at  some  point  where 
its  derivative  isn’t  well  defined. 
Figure  o  shows  such  a  situation. 
There  is  a  kink  in  the  function  at 
t  =  0,  so  a  wide  variety  of  lines 
could  be  placed  through  the  graph 
there,  all  with  different  slopes  and 
all  staying  on  one  side  of  the  graph. 
There  is  no  uniquely  defined  tan¬ 
gent  line,  so  the  derivative  is  unde¬ 
fined. 

Example  8 

>  Rancher  Rick  has  a  length  of  cy¬ 
clone  fence  L  with  which  to  enclose  a 
rectangular  pasture.  Show  that  he  can 
enclose  the  greatest  possible  area  by 
forming  a  square  with  sides  of  length 
L/A. 

>  If  the  width  and  length  of  the  rect¬ 
angle  are  t  and  u,  and  Rick  is  go¬ 
ing  to  use  up  all  his  fencing  material, 
then  the  perimeter  of  the  rectangle, 
2 1  +  2 u,  equals  L,  so  for  a  given  width, 
f,  the  length  is  u  =  L/2  -  t.  The  area 


x 


o  /  The  function  x  =  \t\ 
has  a  minimum  at  t  = 

0,  which  is  not  a  place 
where  x  =  0.  This  is  a 
point  where  the  function 
isn’t  differentiable. 

is  a  =  tu  =  t(L/2  -  t).  The  func¬ 
tion  only  means  anything  realistic  for 
0  <  f  <  L/2,  since  for  values  of  t  out¬ 
side  this  region  either  the  width  or  the 
height  of  the  rectangle  would  be  neg¬ 
ative.  The  function  a(t)  could  there¬ 
fore  have  a  maximum  either  at  a  place 
where  a  =  0,  or  at  the  endpoints  of  the 
function’s  domain.  We  can  eliminate 
the  latter  possibility,  because  the  area 
is  zero  at  the  endpoints. 

To  evaluate  the  derivative,  we  first 
need  to  reexpress  a  as  a  polynomial: 

a  =  -t2  +  ^t. 

The  derivative  is 

a=-2t+^. 

Setting  this  equal  to  zero,  we  find  t  = 
L/A,  as  claimed.  This  is  a  maximum, 
not  a  minimum  or  an  inflection  point, 
because  the  second  derivative  is  the 
constant  a  =  -2,  which  is  negative  for 
all  f,  including  t  =  L/A. 
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Propagation  of  errors 

The  Women’s  National  Basketball 
Association  says  that  balls  used  in 
its  games  should  have  a  radius  of 
11.6  cm,  with  an  allowable  range  of 
error  of  plus  or  minus  0.1  cm  (one 
millimeter).  How  accurately  can 
we  determine  the  ball’s  volume? 


!"• - H 

11.6±.l  cm 


p  /  How  accurately  can  we  determine 
the  ball’s  volume? 


The  equation  for  the  volume  of 
a  sphere  gives  V  =  (4/3)7rr3  = 
6538  cm3  (about  six  and  a  half 
liters).  We  have  a  function  V(r), 
and  we  want  to  know  how  much 
of  an  effect  will  be  produced  on 
the  function’s  output  V  if  its  in¬ 
put  r  is  changed  by  a  certain  small 
amount.  Since  the  amount  by 
which  r  can  be  changed  is  small 
compared  to  r,  it’s  reasonable  to 


take  the  tangent  line  as  an  ap¬ 
proximation  to  the  actual  graph. 
The  slope  of  the  tangent  line  is 
the  derivative  of  V,  which  is  4-7rr2. 
(This  is  the  ball’s  surface  area.) 
Setting  (slope)  =  (rise) /(run)  and 
solving  for  the  rise,  which  repre¬ 
sents  the  change  in  V,  we  find 
that  it  could  be  off  by  as  much  as 
(47rr2)(0.1  cm)  =  170  cm3.  The 
volume  of  the  ball  can  therefore  be 
expressed  as  6500±170  cm3,  where 
the  original  figure  of  6538  has  been 
rounded  off  to  the  nearest  hundred 
in  order  to  avoid  creating  the  im¬ 
pression  that  the  3  and  the  8  actu¬ 
ally  mean  anything  —  they  clearly 
don’t,  since  the  possible  error  is 
out  in  the  hundreds’  place. 

This  calculation  is  an  example  of  a 
very  common  situation  that  occurs 
in  the  sciences,  and  even  in  every¬ 
day  life,  in  which  we  base  a  calcu¬ 
lation  on  a  number  that  has  some 
range  of  uncertainty  in  it,  causing  a 
corresponding  range  of  uncertainty 
in  the  final  result.  This  is  called 
propagation  of  errors.  The  idea  is 
that  the  derivative  expresses  how 
sensitive  the  function’s  output  is  to 
its  input. 

The  example  of  the  basketball 
could  also  have  been  handled  with¬ 
out  calculus,  simply  by  recalculat¬ 
ing  the  volume  using  a  radius  that 
was  raised  from  11.6  to  11.7  cm, 
and  finding  the  difference  between 
the  two  volumes.  Understanding  it 
in  terms  of  calculus,  however,  gives 
us  a  different  way  of  getting  at  the 
same  ideas,  and  often  allows  us  to 
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understand  more  deeply  what’s  go¬ 
ing  on.  For  example,  we  noticed  in 
passing  that  the  derivative  of  the 
volume  was  simply  the  surface  area 
of  the  ball,  which  provides  a  nice 
geometric  visualization.  We  can 
imagine  inflating  the  ball  so  that 
its  radius  is  increased  by  a  millime¬ 
ter.  The  amount  of  added  volume 
equals  the  surface  area  of  the  ball 
multiplied  by  one  millimeter,  just 
as  the  amount  of  volume  added  to 
the  world’s  oceans  by  global  warm¬ 
ing  equals  the  oceans’  surface  area 
multiplied  by  the  added  depth. 

For  an  example  of  an  insight 
that  we  would  have  missed  if  we 
hadn’t  applied  calculus,  consider 
how  much  error  is  incurred  in  the 
measurement  of  the  width  of  a 
book  if  the  ruler  is  placed  on  the 
book  at  a  slightly  incorrect  angle, 
so  that  it  doesn’t  form  an  angle 
of  exactly  90  degrees  with  spine. 
The  measurement  has  its  minimum 
(and  correct)  value  if  the  ruler  is 
placed  at  exactly  90  degrees.  Since 
the  function  has  a  minimum  at 
this  angle,  its  derivative  is  zero. 
That  means  that  we  expect  essen¬ 
tially  no  error  in  the  measurement 
if  the  ruler’s  angle  is  just  a  tiny 
bit  off.  This  gives  us  the  insight 
that  it’s  not  worth  fiddling  exces¬ 
sively  over  the  angle  in  this  mea¬ 
surement.  Other  sources  of  error 
will  be  more  important.  For  exam¬ 
ple,  is  the  book  a  uniform  rectan¬ 
gle?  Are  we  using  the  worn  end  of 
the  ruler  as  its  zero,  rather  than 
letting  the  ruler  hang  over  both 


sides  of  the  book  and  subtracting 
the  two  measurements? 


PROBLEMS 
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Problems 

1  Graph  the  function  t 2  in  the 

neighborhood  of  t  =  3,  draw  a  tan¬ 
gent  line,  and  use  its  slope  to  verify 
that  the  derivative  equals  2 1  at  this 
point.  >  Solution,  p.  166 

2  Graph  the  function  sine*  in 
the  neighborhood  of  t  =  0,  draw  a 
tangent  line,  and  use  its  slope  to 
estimate  the  derivative.  Answer: 
0.5403023058.  (You  will  of  course 
not  get  an  answer  this  precise  using 
this  technique.) 

>  Solution,  p.  166 

3  Differentiate  the  follow¬ 
ing  functions  with  respect  to  t: 
1,  7,  t,  7 t,  t2,  7 12,  t3,  7t3. 

>  Solution,  p.  167 

4  Differentiate  3 17  —  4 12  +  6  with 
respect  to  t.  >  Solution,  p.  167 

5  Differentiate  at2  +bt  +  c  with 
respect  to  t. 

>  Solution,  p.  167  [Thompson,  1919] 

6  Find  two  different  functions 

whose  derivatives  are  the  constant 
3,  and  give  a  geometrical  interpre¬ 
tation.  >  Solution,  p.  167 

7  Find  a  function  x  whose 

derivative  is  x  =  t‘ .  In  other 
words,  integrate  the  given  func¬ 
tion.  >  Solution,  p.  168 

8  Find  a  function  x  whose 

derivative  is  x  =  3 t7 .  In  other 
words,  integrate  the  given  func¬ 
tion.  >  Solution,  p.  168 

9  Find  a  function  x  whose 

derivative  is  x  =  3 17  —  4 12  +  6. 


In  other  words,  integrate  the  given 
function.  >  Solution,  p.  168 

10  Let  t,  be  the  time  that  has 

elapsed  since  the  Big  Bang.  In 
that  time,  one  would  imagine  that 
light,  traveling  at  speed  c,  has  been 
able  to  travel  a  maximum  distance 
ct.  (In  fact  the  distance  is  several 
times  more  than  this,  because  ac¬ 
cording  to  Einstein’s  theory  of  gen¬ 
eral  relativity,  space  itself  has  been 
expanding  while  the  ray  of  light 
was  in  transit.)  The  portion  of 
the  universe  that  we  can  observe 
would  then  be  a  sphere  of  radius 
ct,  with  volume  v  =  (4/3)7rr3  = 
(4/3)7r(cf)3.  Compute  the  rate  v 
at  which  the  volume  of  the  ob¬ 
servable  universe  is  increasing,  and 
check  that  your  answer  has  the 
right  units,  as  in  example  3  on  page 
14.  >  Solution,  p.  168 

11  Kinetic  energy  is  a  measure 

of  an  object’s  quantity  of  motion; 
when  you  buy  gasoline,  the  energy 
you’re  paying  for  will  be  converted 
into  the  car’s  kinetic  energy  (actu¬ 
ally  only  some  of  it,  since  the  en¬ 
gine  isn’t  perfectly  efficient).  The 
kinetic  energy  of  an  object  with 
mass  to  and  velocity  v  is  given  by 
K  =  (1/2 )mv2.  For  a  car  acceler¬ 
ating  at  a  steady  rate,  with  v  =  at, 
find  the  rate  K  at  which  the  en¬ 
gine  is  required  to  put  out  kinetic 
energy.  K,  with  units  of  energy 
over  time,  is  known  as  the  power. 
Check  that  your  answer  has  the 
right  units,  as  in  example  3  on  page 
14.  >  Solution,  p.  168 
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12  A  metal  square  expands  and 
contracts  with  temperature,  the 
lengths  of  its  sides  varying  accord¬ 
ing  to  the  equation  £  =  (1  +  aT)£0. 
Find  the  rate  of  change  of  its  sur¬ 
face  area  a  with  respect  to  tem¬ 
perature.  That  is,  find  a,  where 
the  variable  with  respect  to  which 
you’re  differentiating  is  the  tem¬ 
perature,  T.  Check  that  your  an¬ 
swer  has  the  right  units,  as  in  ex¬ 
ample  3  on  page  14. 

>  Solution,  p.  169 

13  Find  the  second  derivative  of 

2t3  —  t.  >  Solution,  p.  169 

14  Locate  any  points  of  inflec¬ 

tion  of  the  function  t3  +  t2 .  Verify 
by  graphing  that  the  concavity  of 
the  function  reverses  itself  at  this 
point.  >  Solution,  p.  169 

15  Let’s  see  if  the  rule  that  the 
derivative  of  tk  is  ktk~l  also  works 
for  k  <  0.  Use  a  graph  to  test  one 
particular  case,  choosing  one  par¬ 
ticular  negative  value  of  k,  and  one 
particular  value  of  t.  If  it  works, 
what  does  that  tell  you  about  the 
rule?  If  it  doesn’t  work? 

t>  Solution,  p.  169 

16  Two  atoms  will  interact  via 
electrical  forces  between  their  pro¬ 
tons  and  electrons.  To  put  them 
at  a  distance  r  from  one  another 
(measured  from  nucleus  to  nu¬ 
cleus),  a  certain  amount  of  energy 
E  is  required,  and  the  minimum 
energy  occurs  when  the  atoms  are 
in  equilibrium,  forming  a  molecule. 
Often  a  fairly  good  approximation 
to  the  energy  is  the  Lennard-Jones 


expression 


E(r)  =  k 


where  k  and  a  are  constants.  Note 
that,  as  proved  in  chapter  2,  the 
rule  that  the  derivative  of  tk  is 
also  works  for  k  <  0.  Show 
that  there  is  an  equilibrium  at  r  = 
a.  Verify  (either  by  graphing  or  by 
testing  the  second  derivative)  that 
this  is  a  minimum,  not  a  maximum 
or  a  point  of  inflection. 

>  Solution,  p.  171 

17  Prove  that  the  total  number 

of  maxima  and  minima  possessed 
by  a  third-order  polynomial  is  at 
most  two.  >  Solution,  p.  172 

18  Functions  /  and  g  are  de¬ 
fined  on  the  whole  real  line,  and 
are  differentiable  everywhere.  Let 
s  =  /  +  g  be  their  sum.  In  what 
ways,  if  any,  are  the  extrema  of  /, 
g ,  and  s  related? 

>  Solution,  p.  172 

19  Euclid  proved  that  the  vol¬ 

ume  of  a  pyramid  equals  (1/3 )bh, 
where  b  is  the  area  of  its  base, 
and  h  its  height.  A  pyramidal 
tent  without  tent-poles  is  erected 
by  blowing  air  into  it  under  pres¬ 
sure.  The  area  of  the  base  is  easy 
to  measure  accurately,  because  the 
base  is  nailed  down,  but  the  height 
fluctuates  somewhat  and  is  hard  to 
measure  accurately.  If  the  amount 
of  uncertainty  in  the  measured 
height  is  plus  or  minus  e^,  find  the 
amount  of  possible  error  ey  in  the 
volume.  >  Solution,  p.  173 


PROBLEMS 


20  A  hobbyist  is  going  to  mea¬ 
sure  the  height  to  which  her  model 
rocket  rises  at  the  peak  of  its  tra¬ 
jectory.  She  plans  to  take  a  digi¬ 
tal  photo  from  far  away  and  then 
do  trigonometry  to  determine  the 
height,  given  the  baseline  from  the 
launchpad  to  the  camera  and  the 
angular  height  of  the  rocket  as 
determined  from  analysis  of  the 
photo.  Comment  on  the  error  in¬ 
curred  by  the  inability  to  snap  the 
photo  at  exactly  the  right  moment. 

>  Solution,  p.  173 

21  Prove,  as  claimed  on  p.  10, 
that  if  the  sum  l2  +  22  +  . . .  +  n2 
is  a  polynomial,  it  must  be  of  third 
order,  and  the  coefficient  of  the  n3 
term  must  be  1/3. 

>  Solution,  p.  173 
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2  To  infinity  —  and 
beyond! 


a  /  Gottfried  Leibniz 
(1646-1716) 


Little  kids  readily  pick  up  the  idea 
of  infinity.  “When  I  grow  up, 
I’m  gonna  have  a  million  Barbies.” 
“Oh  yeah?  Well,  I’m  gonna  have 
a  billion.”  “Well,  I’m  gonna  have 
infinity  Barbies.”  “So  what?  I’ll 
have  two  infinity  of  them.”  Adults 
laugh,  convinced  that  infinity,  oo, 
is  the  biggest  number,  so  2oo  can’t 
be  any  bigger.  This  is  the  idea  be¬ 
hind  a  joke  in  the  movie  Toy  Story. 
Buzz  Lightyear’s  slogan  is  “To  in¬ 
finity  —  and  beyond!”  We  assume 
there  isn’t  any  beyond.  Infinity  is 
supposed  to  be  the  biggest  there 
is,  so  by  definition  there  can’t  be 
anything  bigger,  right? 

2.1  Infinitesimals 

Actually  mathematicians  have  in¬ 
vented  many  different  logical  sys¬ 


tems  for  working  with  infinity,  and 
in  most  of  them  infinity  does  come 
in  different  sizes  and  flavors.  New¬ 
ton,  as  well  as  the  German  mathe¬ 
matician  Leibniz  who  invented  cal¬ 
culus  independently,1  had  a  strong 
intuitive  idea  that  calculus  was  re¬ 
ally  about  numbers  that  were  in¬ 
finitely  small:  infinitesimals,  the 
opposite  of  infinities.  For  instance, 
consider  the  number  1.1 2  =  1.21. 
That  2  in  the  first  decimal  place 
is  the  same  2  that  appears  in  the 
expression  2t  for  the  derivative  of 
t2. 


b  /  A  close-up  view  of  the 
function  x  =  f2,  show¬ 
ing  the  line  that  con¬ 
nects  the  points  (1,1) 
and  (1.1, 1.21). 


1  There  is  some  dispute  over  this  point. 
Newton  and  his  supporters  claimed  that 

Leibniz  plagiarized  Newton’s  ideas,  and 
merely  invented  a  new  notation  for  them. 
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Figure  b  shows  the  idea  visually. 
The  line  connecting  the  points 
(1,1)  and  (1.1,1.21)  is  almost  in¬ 
distinguishable  from  the  tangent 
line  on  this  scale.  Its  slope  is 
(1.21  -  1)/(1.1  -  1)  =  2.1,  which 
is  very  close  to  the  tangent  line’s 
slope  of  2.  It  was  a  good  approx¬ 
imation  because  the  points  were 
close  together,  separated  by  only 
0.1  on  the  t  axis. 

If  we  needed  a  better  approxi¬ 
mation,  we  could  try  calculating 
1.012  =  1.0201.  The  slope  of  the 
line  connecting  the  points  (1,1) 
and  (1.01, 1.0201)  is  2.01,  which  is 
even  closer  to  the  slope  of  the  tan¬ 
gent  line. 

Another  method  of  visualizing  the 
idea  is  that  we  can  interpret  x  =  f2 
as  the  area  of  a  square  with  sides 
of  length  t,  as  suggested  in  fig¬ 
ure  c.  We  increase  t  by  an  in¬ 
finitesimally  small  number  df.  The 
d  is  Leibniz’s  notation  for  a  very 
small  difference,  and  d t  is  to  be 
read  as  a  single  symbol,  “dee-tee,” 
not  as  a  number  d  multiplied  by 


t  dt 

dt2 

t2 

t  dt 

t  dt 


c  /  A  geometrical  inter¬ 
pretation  of  the  derivative 
off2. 


a  number  t.  The  idea  is  that  df 
is  smaller  than  any  ordinary  num¬ 
ber  you  could  imagine,  but  it’s  not 
zero.  The  area  of  the  square  is  in¬ 
creased  by  dx  =  2tdt  +  df2,  which 
is  analogous  to  the  finite  numbers 
0.21  and  0.0201  we  calculated  ear¬ 
lier.  Where  before  we  divided  by 
a  finite  change  in  f  such  as  0.1  or 
0.01,  now  we  divide  by  dt,  produc¬ 
ing 

dx  2 1  dt  +  dt2 

df  df 

=  2t  T  dt 

for  the  derivative.  On  a  graph  like 
figure  b,  dx/  dt  is  the  slope  of  the 
tangent  line:  the  change  in  x  di¬ 
vided  by  the  changed  in  t. 

But  adding  an  infinitesimal  num¬ 
ber  dt  onto  2 f  doesn’t  really  change 
it  by  any  amount  that’s  even  the¬ 
oretically  measurable  in  the  real 
world,  so  the  answer  is  really  2 f. 
Evaluating  it  at  t  =  1  gives  the 
exact  result,  2,  that  the  earlier 
approximate  results,  2.1  and  2.01, 
were  getting  closer  and  closer  to. 

Example  9 

To  show  the  power  of  infinitesimals 
and  the  Leibniz  notation,  let’s  prove 
that  the  derivative  of  f3  is  3 f2: 

dx  _  (f  +  df)3  -  f3 
df  “  df 

3f2df  +  3fdf2  +  df3 
df 

=  3f2  +  . . . , 

where  the  dots  indicate  infinitesimal 
terms  that  we  can  neglect. 
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This  result  required  significant 
sweat  and  ingenuity  when  proved 
on  page  140  by  the  methods  of 
chapter  1,  and  not  only  that 
but  the  old  method  would  have 
required  a  completely  different 
method  of  proof  for  a  function  that 
wasn’t  a  polynomial,  whereas  the 
new  one  can  be  applied  more  gen¬ 
erally,  as  we’ll  see  presently  in  ex¬ 
amples  10-13. 

It’s  easy  to  get  the  mistaken  im¬ 
pression  that  infinitesimals  exist 
in  some  remote  fairyland  where  we 
can  never  touch  them.  This  may 
be  true  in  the  same  artsy-fartsy 
sense  that  we  can  never  truly  un¬ 
derstand  \/2,  because  its  decimal 
expansion  goes  on  forever,  and 
we  therefore  can  never  compute 
it  exactly.  But  in  practical  work, 
that  doesn’t  stop  us  from  working 
with  \J2.  We  just  approximate  it 
as,  e.g.,  1.41.  Infinitesimals  are  no 
more  or  less  mysterious  than  irra¬ 
tional  numbers,  and  in  particular 
we  can  represent  them  concretely 
on  a  computer.  If  you  go  to 
lightandmatter . com/ calc/inf, 
you’ll  find  a  web-based  calculator 
called  Inf,  which  can  handle 
infinite  and  infinitesimal  numbers. 
It  has  a  built-in  symbol,  d,  which 
represents  an  infinitesimally  small 
number  such  as  the  dx’s  and  dt’s 
we’ve  been  handling  symbolically. 

Let’s  use  Inf  to  verify  that  the 
derivative  of  t3,  evaluated  at  t  =  1, 
is  equal  to  3,  as  found  by  plug¬ 
ging  in  to  the  result  of  example  9. 
The  :  symbol  is  the  prompt  that 


shows  you  Inf  is  ready  to  accept 
your  typed  input. 

:  ((l+d)-3-l)/d 

3+3d+d ~2 

As  claimed,  the  result  is  3,  or  close 
enough  to  3  that  the  infinitesimal 
error  doesn’t  matter  in  real  life.  It 
might  look  like  Inf  did  this  exam¬ 
ple  by  using  algebra  to  simplify  the 
expression,  but  in  fact  Inf  doesn’t 
know  anything  about  algebra.  One 
way  to  see  this  is  to  use  Inf  to  com¬ 
pare  d  with  various  real  numbers: 

:  del 

true 

:  d<0.01 

true 

:  d<0. 0000001 

true 

:  d<0 

false 

If  d  were  just  a  variable  being 
treated  according  to  the  axioms  of 
algebra,  there  would  be  no  way  to 
tell  how  it  compared  with  other 
numbers  without  having  some  spe¬ 
cial  information.  Inf  doesn’t  know 
algebra,  but  it  does  know  that  d 
is  a  positive  number  that  is  less 
than  any  positive  real  number  that 
can  be  represented  using  decimals 
or  scientific  notation. 

Example  10 

In  example  5  on  p.  15,  we  made  a 
rough  numerical  check  to  see  if  the 
differentiation  rule  tk  — >  ktk~\  which 
was  proved  on  p.  140  for  k  =  1 ,  2,  3, 
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was  also  valid  for  k  =  - 1,  i.e., 
for  the  function  x  =  1/f.  Let’s  look 
for  an  actual  proof.  To  find  a  natu¬ 
ral  method  of  attack,  let's  first  redo 
the  numerical  check  in  a  slightly  more 
suggestive  form.  Again  approximating 
the  derivating  at  f  =  3,  we  have 


Let’s  apply  the  grade-school  tech¬ 
nique  for  subtracting  fractions,  in 
which  we  first  get  them  over  the  same 
denominator: 

1  1  3-3.01 

3  aoT  “  3  x  3.01 ' 

The  result  is 


dx  _ 

(  ~0'01  ^ 

(  1  ^ 

df  ~ 

V  3  x  3.01  / 

\0.01  ) 

1 

3  x  3.01 ' 


Replacing  3  with  t  and  0.01  with  df, 
this  becomes 

dx  _  1 

df  -  f(f  +  df) 

=  -r2  +  ... 


Example  11 

The  derivative  of  x  =  sin  f,  with  f  in 
units  of  radians,  is 

dx  sin(f  +  df)  -  sin  f 
df  =  df  ’ 

and  with  the  trig  identity  sin(a  +  |3)  = 
sin  a  cos  (3  +  cos  asin  (3,  this  becomes 

sin  f  cos  df  +  cos  f  sin  df  -  sin  f 
~  df  ' 


d  /  Graphs  of  sin  f,  and 
its  derivative  cos  f. 

Applying  the  small-angle  approxima¬ 
tions  sinu  «  u  and  cos  u  «  1,  we 
have 

dx  cosfdf 
df  "  df  +"' 

=  cos  f  +  . .  .  , 

where  “. . .  ”  represents  the  error 
caused  by  the  small-angle  approxima¬ 
tions. 

This  is  essentially  all  there  is  to  the 
computation  of  the  derivative,  except 
for  the  remaining  technical  point  that 
we  haven’t  proved  that  the  small-angle 
approximations  are  good  enough.  In 
example  9  on  page  26,  when  we  cal¬ 
culated  the  derivative  of  f3,  the  result¬ 
ing  expression  for  the  quotient  dx/df 
came  out  in  a  form  in  which  we  could 
inspect  the  “. . .  ”  terms  and  verify  be¬ 
fore  discarding  them  that  they  were  in¬ 
finitesimal.  The  issue  is  less  trivial  in 
the  present  example.  This  point  is  ad¬ 
dressed  more  rigorously  on  page  141. 

Figure  d  shows  the  graphs  of  the  func¬ 
tion  and  its  derivative.  Note  how  the 
two  graphs  correspond.  At  f  =  0, 
the  slope  of  sin  f  is  at  its  largest,  and 
is  positive;  this  is  where  the  deriva¬ 
tive,  cos  f,  attains  its  maximum  posi¬ 
tive  value  of  1 .  At  f  =  n/ 2,  sin  f  has 
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reached  a  maximum,  and  has  a  slope 
of  zero;  cos  t  is  zero  here.  At  f  =  n, 
in  the  middle  of  the  graph,  sin  t  has  its 
maximum  negative  slope,  and  cos  t  is 
at  its  most  negative  extreme  of  -1 . 

Physically,  sin  t  could  represent  the 
position  of  a  pendulum  as  it  moved 
back  and  forth  from  left  to  right,  and 
cost  would  then  be  the  pendulum’s 
velocity. 

Example  12 

What  about  the  derivative  of  the  co¬ 
sine?  The  cosine  and  the  sine  are  re¬ 
ally  the  same  function,  shifted  to  the 
left  or  right  by  n/ 2.  If  the  derivative 
of  the  sine  is  the  same  as  itself,  but 
shifted  to  the  left  by  7t/2,  then  the 
derivative  of  the  cosine  must  be  a  co¬ 
sine  shifted  to  the  left  by  n/2: 

dcosf  ., 

— =cos(f  +  7t/2) 

=  -  sin  t. 

The  next  example  will  require  a 
little  trickery.  By  the  end  of  this 
chapter  you’ll  learn  general  tech¬ 
niques  for  cranking  out  any  deriva¬ 
tive  cookbook-style,  without  hav¬ 
ing  to  come  up  with  any  tricks. 

Example  13 

>  Find  the  derivative  of  1  /(I  —  f),  eval¬ 
uated  at  t  =  0. 

t>  The  graph  shows  what  the  function 
looks  like.  It  blows  up  to  infinity  at  t  = 
1 ,  but  it's  well  behaved  at  f  =  0,  where 
it  has  a  positive  slope. 

For  insight,  let's  calculate  some  points 
on  the  curve.  The  point  at  which 
we’re  differentiating  is  (0,1).  If  we 
put  in  a  small,  positive  value  of  f, 


x 


e  /  The  function  x 
1/(1  -t). 

we  can  observe  how  much  the  re¬ 
sult  increases  relative  to  1,  and  this 
will  give  us  an  approximation  to  the 
derivative.  For  example,  we  find  that 
at  t  =  0.001,  the  function  has  the 
value  1.001001001001,  and  so  the 
derivative  is  approximately  (1.001  - 
1)/(.001  -  0),  or  about  1.  We  can 
therefore  conjecture  that  the  deriva¬ 
tive  is  exactly  1,  but  that’s  not  the 
same  as  proving  it. 

But  let’s  take  another  look  at  that  num¬ 
ber  1 .001 001 001 001 .  It’s  clearly  a  re¬ 
peating  decimal.  In  other  words,  it  ap¬ 
pears  that 

1  i  1  /  1  \2 

1  -  1/1000  1000  +  V1000yJ  +'"’ 

and  we  can  easily  verify  this  by  mul¬ 
tiplying  both  sides  of  the  equation  by 
1  - 1  /1 000  and  collecting  like  powers. 
This  is  a  special  case  of  the  geometric 
series 


which  can  be  derived2  by  doing  syn¬ 
thetic  division  (the  equivalent  of  long 

2As  a  technical  aside,  it’s  not  neces- 
sary  for  our  present  purposes  to  go  into 
the  issue  of  how  to  make  the  most  gen¬ 
eral  possible  definition  of  what  is  meant 
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division  for  polynomials),  or  simply 
verified,  after  forming  the  conjec¬ 
ture  based  on  the  numerical  example 
above,  by  multiplying  both  sides  by 
1  -  t. 

As  we’ll  see  in  section  2.2,  and  have 
been  implicitly  assuming  so  far,  in¬ 
finitesimals  obey  all  the  same  ele¬ 
mentary  laws  of  algebra  as  the  real 
numbers,  so  the  above  derivation  also 
holds  for  an  infinitesimal  value  of  t. 
We  can  verify  the  result  using  Inf: 

:  1/ (l-d) 

1  +d+d  ~2+d  ~3+d 

Notice,  however,  that  the  series  is 
truncated  after  the  first  five  terms. 
This  is  similar  to  the  truncation  that 
happens  when  you  ask  your  calcula¬ 
tor  to  find  V2  as  a  decimal. 

The  result  for  the  derivative  is 

6x  _  (1  +df  +  df2  +  ...)  -  1 
df  1  +df-  1 

=  1  +.... 

2.2  Safe  use  of 
infinitesimals 

The  idea  of  infinitesimally  small 
numbers  has  always  irked  purists. 

by  a  sum  like  this  one  which  has  an  infi¬ 
nite  number  of  terms;  the  only  fact  we’ll 
need  here  is  that  the  error  in  finite  sum 
obtained  by  leaving  out  the  “. . .  ”  has 
only  higher  powers  of  t.  This  is  taken 
up  in  more  detail  in  ch.  7.  Note  that 
the  series  only  gives  the  right  answer 
for  t  <  1.  E.g.,  for  t  =  1 ,  it  equals 

1  + 1  + 1  + . . .,  which,  if  it  means  anything, 
clearly  means  something  infinite. 


f  /  Bishop  George  Berke¬ 
ley  (1685-1753) 


One  prominent  critic  of  the  cal¬ 
culus  was  Newton’s  contemporary 
George  Berkeley,  the  Bishop  of 
Cloyne.  Although  some  of  his 
complaints  are  clearly  wrong  (he 
denied  the  possibility  of  the  sec¬ 
ond  derivative),  there  was  clearly 
something  to  his  criticism  of  the 
infinitesimals.  He  wrote  sarcas¬ 
tically,  “They  are  neither  finite 
quantities,  nor  quantities  infinitely 
small,  nor  yet  nothing.  May  we  not 
call  them  ghosts  of  departed  quan¬ 
tities?” 

Infinitesimals  seemed  scary,  be¬ 
cause  if  you  mishandled  them,  you 
could  prove  absurd  things.  For 
example,  let  dzt  be  an  infinitesi¬ 
mal.  Then  2  dzt  is  also  infinites¬ 
imal.  Therefore  both  1/dzz  and 
1/(2  dzt)  equal  infinity,  so  1/  dzt  = 
1/(2 dzt).  Multiplying  by  dzz  on 
both  sides,  we  have  a  proof  that 
1  =  1/2. 

In  the  eighteenth  century,  the  use 
of  infinitesimals  became  like  adul¬ 
tery:  commonly  practiced,  but 
shameful  to  admit  to  in  polite  cir¬ 
cles.  Those  who  used  them  learned 
certain  rules  of  thumb  for  handling 
them  correctly.  For  instance,  they 
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would  identify  the  flaw  in  my  proof 
of  1  =  1/2  as  my  assumption  that 
there  was  only  one  size  of  infinity, 
when  actually  1  /  du  should  be  in¬ 
terpreted  as  an  infinity  twice  as  big 
as  1/(2 du).  The  use  of  the  sym¬ 
bol  oo  played  into  this  trap,  be¬ 
cause  the  use  of  a  single  symbol 
for  infinity  implied  that  infinities 
only  came  in  one  size.  However, 
the  practitioners  of  infinitesimals 
had  trouble  articulating  a  clear 
set  of  principles  for  their  proper 
use,  and  couldn’t  prove  that  a  self- 
consistent  system  could  be  built 
around  them. 

By  the  twentieth  century,  when 
I  learned  calculus,  a  clear  con¬ 
sensus  had  formed  that  infinite 
and  infinitesimal  numbers  weren’t 
numbers  at  all.  A  notation  like 
dx/d t,  my  calculus  teacher  told 
me,  wasn’t  really  one  number  di¬ 
vided  by  another,  it  was  merely 
a  symbol  for  something  called  a 
limit, 

..  Ax 

inn  — — , 

At— >o  At 

where  Ax  and  At  represented  fi¬ 
nite  changes.  I’ll  give  a  formal  def¬ 
inition  (actually  two  different  for¬ 
mal  definitions)  of  the  term  “limit” 
in  section  3.2,  but  intuitively  the 
concept  is  that  we  can  get  as  good 
an  approximation  to  the  derivative 
as  we  like,  provided  that  we  make 
At  small  enough. 

That  satisfied  me  until  we  got  to 
a  certain  topic  (implicit  differen¬ 
tiation)  in  which  we  were  encour¬ 
aged  to  break  the  dx  away  from 


the  dt,  leaving  them  on  opposite 
sides  of  the  equation.  I  button¬ 
holed  my  teacher  after  class  and 
asked  why  he  was  now  doing  what 
he’d  told  me  you  couldn’t  really 
do,  and  his  response  was  that  dx 
and  dt  weren’t  really  numbers,  but 
most  of  the  time  you  could  get 
away  with  treating  them  as  if  they 
were,  and  you  would  get  the  right 
answer  in  the  end.  Most  of  the 
time!?  That  bothered  me.  How 
was  I  supposed  to  know  when  it 
wasn’t  “most  of  the  time?” 


g  /  Abraham  Robinson 
(1918-1974) 


But  unknown  to  me  and  my 
teacher,  mathematician  Abraham 
Robinson  had  already  shown  in  the 
1960’s  that  it  was  possible  to  con¬ 
struct  a  self-consistent  number  sys¬ 
tem  that  included  infinite  and  in¬ 
finitesimal  numbers.  He  called  it 
the  hyperreal  number  system,  and 
it  included  the  real  numbers  as  a 
subset.3 

3  The  main  text  of  this  book  treats  in- 
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Moreover,  the  rules  for  what  you 
can  and  can’t  do  with  the  hy- 
perreals  turn  out  to  be  extremely 
simple.  Take  any  true  statement 
about  the  real  numbers.  Suppose 
it’s  possible  to  translate  it  into  a 
statement  about  the  hyperreals  in 
the  most  obvious  way,  simply  by 
replacing  the  word  “real”  with  the 
word  “hyperreal.”  Then  the  trans¬ 
lated  statement  is  also  true.  This 
is  known  as  the  transfer  principle. 

Let’s  look  back  at  my  bogus  proof 
of  1  =  1/2  in  light  of  this  sim¬ 
ple  principle.  The  final  step  of 
the  proof,  for  example,  is  perfectly 
valid:  multiplying  both  sides  of  the 
equation  by  the  same  thing.  The 
following  statement  about  the  real 
numbers  is  true: 

For  any  real  numbers  a,  b ,  and 

c,  if  a  =  6,  then  ac  =  be. 

This  can  be  translated  in  an  obvi¬ 
ous  way  into  a  statement  about  the 
hyperreals: 

For  any  hyperreal  numbers  a, 

b ,  and  c,  if  a  =  b,  then  ac  =  be. 

However,  what  about  the  state¬ 
ment  that  both  1/  d u  and  1/(2  du) 
equal  infinity,  so  they’re  equal  to 
each  other?  This  isn’t  the  trans¬ 
lation  of  a  statement  that’s  true 

finitesimals  with  the  minimum  fuss  nec¬ 
essary  in  order  to  avoid  the  common 
goofs.  More  detailed  discussions  are  of¬ 
ten  relegated  to  the  back  of  the  book,  as 
in  example  11  on  page  28.  The  reader 
who  wants  to  learn  even  more  about  the 
hyperreal  system  should  consult  the  list 
of  further  reading  on  page  201. 


about  the  reals,  so  there’s  no  rea¬ 
son  to  believe  it’s  true  when  ap¬ 
plied  to  the  hyperreals  —  and  in 
fact  it’s  false. 

What  the  transfer  principle  tells  us 
is  that  the  real  numbers  as  we  nor¬ 
mally  think  of  them  are  not  unique 
in  obeying  the  ordinary  rules  of  al¬ 
gebra.  There  are  completely  dif¬ 
ferent  systems  of  numbers,  such 
as  the  hyperreals,  that  also  obey 
them. 

How,  then,  are  the  hyperreals  even 
different  from  the  reals,  if  every¬ 
thing  that’s  true  of  one  is  true  of 
the  other?  But  recall  that  the 
transfer  principle  doesn’t  guaran¬ 
tee  that  every  statement  about  the 
reals  is  also  true  of  the  hyperre¬ 
als.  It  only  works  if  the  statement 
about  the  reals  can  be  translated 
into  a  statement  about  the  hyper¬ 
reals  in  the  most  simple,  straight¬ 
forward  way  imaginable,  simply  by 
replacing  the  word  “real”  with  the 
word  “hyperreal.”  Here’s  an  ex¬ 
ample  of  a  true  statement  about 
the  reals  that  can’t  be  translated 
in  this  way: 

For  any  real  number  a,  there 
is  an  integer  n  that  is  greater 
than  a. 

This  one  can’t  be  translated  so 
simplemindedly,  because  it  refers 
to  a  subset  of  the  reals  called 
the  integers.  It  might  be  possi¬ 
ble  to  translate  it  somehow,  but 
it  would  require  some  insight  into 
the  correct  way  to  translate  that 
word  “integer.”  The  transfer  prin- 
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ciple  doesn’t  apply  to  this  state¬ 
ment,  which  indeed  is  false  for  the 
hyperreals,  because  the  lryperre- 
als  contain  infinite  numbers  that 
are  greater  than  all  the  integers. 
In  fact,  the  contradiction  of  this 
statement  can  be  taken  as  a  def¬ 
inition  of  what  makes  the  hyper¬ 
reals  special,  and  different  from 
the  reals:  we  assume  that  there  is 
at  least  one  hyperreal  number,  H, 
which  is  greater  than  all  the  inte¬ 
gers. 

As  an  analogy  from  everyday  life, 
consider  the  following  statements 
about  the  student  body  of  the  high 
school  I  attended: 

1.  Every  student  at  my  high 

school  had  two  eyes  and  a  face. 

2.  Every  student  at  my  high 

school  who  was  on  the  football 

team  was  a  jerk. 

Let’s  try  to  translate  these  into 
statements  about  the  population 
of  California  in  general.  The  stu¬ 
dent  body  of  my  high  school  is  like 
the  set  of  real  numbers,  and  the 
present-day  population  of  Califor¬ 
nia  is  like  the  hyperreals.  State¬ 
ment  1  can  be  translated  mind¬ 
lessly  into  a  statement  that  ev¬ 
ery  Californian  has  two  eyes  and 
a  face;  we  simply  substitute  “ev¬ 
ery  Californian”  for  “every  student 
at  my  high  school.”  But  state¬ 
ment  2  isn’t  so  easy,  because  it 
refers  to  the  subset  of  students 
who  were  on  the  football  team, 
and  it’s  not  obvious  what  the  cor¬ 
responding  subset  of  Californians 


would  be.  Would  it  include  ev¬ 
erybody  who  played  high  school, 
college,  or  pro  football?  Maybe 
it  shouldn’t  include  the  pros,  be¬ 
cause  they  belong  to  an  organiza¬ 
tion  covering  a  region  bigger  than 
California.  Statement  2  is  the  kind 
of  statement  that  the  transfer  prin¬ 
ciple  doesn’t  apply  to.4 

Example  14 

As  a  nontrivial  example  of  how  to  ap¬ 
ply  the  transfer  principle,  let’s  consider 
how  to  handle  expressions  like  the 
one  that  occurred  when  we  wanted  to 
differentiate  f 2  using  infinitesimals: 


I  argued  earlier  that  2 1  +  df  is  so  close 
to  2f  that  for  all  practical  purposes,  the 
answer  is  really  2 1.  But  is  it  really  valid 
in  general  to  say  that  2 1  +  df  is  the 
same  hyperreal  number  as  2 f?  No. 
We  can  apply  the  transfer  principle  to 
the  following  statement  about  the  re¬ 
als: 

For  any  real  numbers  a  and  b, 

with  b  ^  0,  a  +  b  /  a. 

Since  df  isn’t  zero,  2f  +  df  4  2 f. 

More  generally,  example  14  leads 
us  to  visualize  every  number  as  be¬ 
ing  surrounded  by  a  “halo”  of  num¬ 
bers  that  don’t  equal  it,  but  dif¬ 
fer  from  it  by  only  an  infinitesi¬ 
mal  amount.  Just  as  a  magnify¬ 
ing  glass  would  allow  you  to  see 
the  fleas  on  a  dog,  you  would  need 
an  infinitely  strong  microscope  to 

4 For  a  slightly  more  precise  and  for¬ 
mal  statement  of  the  transfer  principle, 
see  page  143. 
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see  this  halo.  This  is  similar  to 
the  idea  that  every  integer  is  sur¬ 
rounded  by  a  bunch  of  fractions 
that  would  round  off  to  that  inte¬ 
ger.  We  can  define  the  standard 
part  of  a  finite  hyperreal  number, 
which  means  the  unique  real  num¬ 
ber  that  differs  from  it  infinitesi¬ 
mally.  For  instance,  the  standard 
part  of  2 1  +  d t,  notated  st(2i  +  df), 
equals  2 1.  The  derivative  of  a  func¬ 
tion  should  actually  be  defined  as 
the  standard  part  of  dx/  df,  but 
we  often  write  dx/  df  to  mean  the 
derivative,  and  don’t  worry  about 
the  distinction. 

One  of  the  things  Bishop  Berkeley 
disliked  about  infinitesimals  was 
the  idea  that  they  existed  in  a 
kind  of  hierarchy,  with  df2  being 
not  just  infinitesimally  small,  but 
infinitesimally  small  compared  to 
the  infinitesimal  df.  If  df  is  the 
flea  on  a  dog,  then  df2  is  a  sub- 
microscopic  flea  that  lives  on  the 
flea,  as  in  Swift’s  doggerel:  “Big 
fleas  have  little  fleas/  On  their 
backs  to  ride  ’em,/  and  little  fleas 
have  lesser  fleas, /And  so,  ad  in¬ 
finitum.”  Berkeley’s  criticism  was 
off  the  mark  here:  there  is  such  a 
hierarchy.  Our  basic  assumption 
about  the  hyperreals  was  that  they 
contain  at  least  one  infinite  num¬ 
ber,  H ,  which  is  bigger  than  all 
the  integers.  If  this  is  true,  then 
1/H  must  be  less  than  1/2,  less 
than  1/100,  less  then  1/1,000,000 
less  than  1/n  for  any  integer  n. 
Therefore  the  hyperreals  are  guar¬ 
anteed  to  include  infinitesimals  as 


well,  and  so  we  have  at  least  three 
levels  to  the  hierarchy:  infinities 
comparable  to  H ,  finite  numbers, 
and  infinitesimals  comparable  to 
1/H.  If  you  can  swallow  that, 
then  it’s  not  too  much  of  a  leap  to 
add  more  rungs  to  the  ladder,  like 
extra-small  infinitesimals  that  are 
comparable  to  1/H2 .  If  this  seems 
a  little  crazy,  it  may  comfort  you 
to  think  of  statements  about  the 
hyperreals  as  descriptions  of  limit¬ 
ing  processes  involving  real  num¬ 
bers.  For  instance,  in  the  sequence 
of  numbers  l.l2  =  1.21,  1.012  = 
1.0201,  1.0012  =  1.002001,  . . . ,  it’s 
clear  that  the  number  represented 
by  the  digit  1  in  the  final  decimal 
place  is  getting  smaller  faster  than 
the  contribution  due  to  the  digit  2 
in  the  middle. 

One  subtle  issue  here,  which  I 
avoided  mentioning  in  the  differen¬ 
tiation  of  the  sine  function  on  page 
28,  is  whether  the  transfer  princi¬ 
ple  is  sufficient  to  let  us  define  all 
the  functions  that  appear  as  famil¬ 
iar  keys  on  a  calculator:  x2,  y/x, 
sinx,  cos  a:,  ex,  and  so  on.  After 
all,  these  functions  were  originally 
defined  as  rules  that  would  take  a 
real  number  as  an  input  and  give  a 
real  number  as  an  output.  It’s  not 
trivially  obvious  that  their  defini¬ 
tions  can  naturally  be  extended  to 
take  a  hyperreal  number  as  an  in¬ 
put  and  give  back  a  hyperreal  as 
an  output.  Essentially  the  answer 
is  that  we  can  apply  the  transfer 
principle  to  them  just  as  we  would 
to  statements  about  simple  arith- 
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metic,  but  I’ve  discussed  this  a  lit¬ 
tle  more  on  page  149. 

2.3  The  product  rule 

When  I  first  learned  calculus,  it 
seemed  to  me  that  if  the  deriva¬ 
tive  of  3 t  was  3,  and  the  deriva¬ 
tive  of  7 t  was  7,  then  the  deriva¬ 
tive  of  t  multiplied  by  t  ought  to 
be  just  plain  old  i,  not  2 t.  The 
reason  there’s  a  factor  of  2  in  the 
correct  answer  is  that  t2  has  two 
reasons  to  grow  as  t  gets  bigger:  it 
grows  because  the  first  factor  of  t 
is  increasing,  but  also  because  the 
second  one  is.  In  general,  it’s  pos¬ 
sible  to  find  the  derivative  of  the 
product  of  two  functions  any  time 
we  know  the  derivatives  of  the  in¬ 
dividual  functions. 

The  product  rule 

If  x  and  y  are  both  functions  of  t. 
then  the  derivative  of  their  product 
is 

dixy)  dx  Ay 

— ; —  =  -r-  '  V  +  x  ■  — . 
dt  At  '  At 


The  proof  is  easy.  Changing  t  by 
an  infinitesimal  amount  At  changes 
the  product  xy  by  an  amount 

(x  +  Ax)(y  +  Ay)  —  xy 
=  yAx  +  xdy  +  da:  Ay, 

and  dividing  by  dt  makes  this  into 


whose  standard  part  is  the  result 
to  be  proved. 

Example  15 

>  Find  the  derivative  of  the  function 
f  sin  t. 

> 

d(fsinf)  ,  d(sinf)  df  .  , 

VJ-,’V  +  Bsin' 

=  t  cos  f  +  sin  t 


Figure  h  gives  the  geometrical  in¬ 
terpretation  of  the  product  rule. 
Imagine  that  the  king,  in  his  cas¬ 
tle  at  the  southwest  corner  of  his 
rectangular  kingdom,  sends  out  a 
line  of  infantry  to  expand  his  terri¬ 
tory  to  the  north,  and  a  line  of  cav¬ 
alry  to  take  over  more  land  to  the 
east.  In  a  time  interval  dt,  the  cav¬ 
alry,  which  moves  faster,  covers  a 
distance  da;  greater  than  that  cov¬ 
ered  by  the  infantry,  Ay.  However, 
the  strip  of  territory  conquered  by 
the  cavalry,  yAx,  isn’t  as  great  as 
it  could  have  been,  because  in  our 
example  y  isn’t  as  big  as  x. 


x  dx 


■  y  +  x  ■ 


Ay 

At 


dx  Ay 
At 


dx 

dt 


h  /  A  geometrical  interpretation  of  the 
product  rule. 
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A  helpful  feature  of  the  Leibniz 
notation  is  that  one  can  easily 
use  it  to  check  whether  the  units 
of  an  answer  make  sense.  If  we 
measure  distances  in  meters  and 
time  in  seconds,  then  xy  has  units 
of  square  meters  (area),  and  so 
does  the  change  in  the  area,  d (xy). 
Dividing  by  dt  gives  the  number 
of  square  meters  per  second  be¬ 
ing  conquered.  On  the  right-hand 
side  of  the  product  rule,  dx/  dt 
has  units  of  meters  per  second 
(velocity),  and  multiplying  it  by 
y  makes  the  units  square  meters 
per  second,  which  is  consistent 
with  the  left-hand  side.  The  units 
of  the  second  term  on  the  right 
likewise  check  out.  Some  begin¬ 
ners  might  be  tempted  to  guess 
that  the  product  rule  would  be 
d(xy)/  dt  =  (dx/  dt)(dy/  dt),  but 
the  Leibniz  notation  instantly  re¬ 
veals  that  this  can’t  be  the  case, 
because  then  the  units  on  the  left, 
m2/s,  wouldn’t  match  the  ones  on 
the  right,  m2/s2. 

Because  this  unit-checking  feature 
is  so  helpful,  there  is  a  special  way 
of  writing  a  second  derivative  in 
the  Leibniz  notation.  What  New¬ 
ton  called  x,  Leibniz  wrote  as 

d2  x 

dt2  ' 

Although  the  different  placement 
of  the  2’s  on  top  and  bottom  seems 
strange  and  inconsistent  to  many 
beginners,  it  actually  works  out 
nicely.  If  x  is  a  distance,  mea¬ 
sured  in  meters,  and  t  is  a  time, 


in  units  of  seconds,  then  the  sec¬ 
ond  derivative  is  supposed  to  have 
units  of  acceleration,  in  units  of 
meters  per  second  per  second,  also 
written  (m/s)/s,  or  m/s2.  (The 
acceleration  of  falling  objects  on 
Earth  is  9.8  m/s2  in  these  units.) 
The  Leibniz  notation  is  meant  to 
suggest  exactly  this:  the  top  of  the 
fraction  looks  like  it  has  units  of 
meters,  because  we’re  not  squaring 
x ,  while  the  bottom  of  the  fraction 
looks  like  it  has  units  of  seconds 
squared,  because  it  looks  like  we’re 
squaring  dt.  Therefore  the  units 
come  out  right.  It’s  important  to 
realize,  however,  that  the  symbol  d 
isn’t  a  number  (not  a  real  one,  and 
not  a  hyperreal  one,  either),  so  we 
can’t  really  square  it;  the  notation 
is  not  to  be  taken  as  a  literal  state¬ 
ment  about  infinitesimals. 


Example  16 

A  tricky  use  of  the  product  rule  is  to 
find  the  derivative  of  Vt.  Since  Vt  can 
be  written  as  f1/2,  we  might  suspect 
that  the  rule  d^j/df  =  /cf*"1  would 
work,  giving  a  derivative  |f_1/2  = 
1  /(2 v7).  However,  the  method  from 
ch.  1  used  to  prove  that  rule  proved 
on  p.140  only  work  if  k  is  an  integer, 
so  the  best  we  could  do  would  be  to 
confirm  our  conjecture  approximately 
by  graphing  or  numerical  estimation. 


Using  the  product  rule,  we  can  write 
f(t)  =  dV~t/dt  for  our  unknown  deriva¬ 
tive,  and  back  into  the  result  using  the 


2.4.  THE  CHAIN  RULE 


37 


product  rule: 

df  _  d(v7%/f) 

df  =  d 

=  f(t)Vt  +  Vtf(t) 

=  2  f{t)Vt 

But  df/df  =  1,  so  f(t)  =  1/(2 y/t)  as 
claimed. 

The  trick  used  in  example  16  can 
also  be  used  to  prove  that  the 
power  rule  d(xn)/ dx  =  nx ap¬ 
plies  to  cases  where  n  is  an  integer 
less  than  0,  but  I’ll  instead  prove 
this  on  page  41  by  a  technique  that 
doesn’t  depend  on  a  trick,  and  also 
applies  to  values  of  n  that  aren’t 
integers. 


2.4  The  chain  rule 

Figure  i  shows  three  clowns  on  see¬ 
saws.  If  the  leftmost  clown  moves 
down  by  a  distance  da;,  the  middle 
one  will  come  up  by  d y,  but  this 
will  also  cause  the  one  on  the  right 
to  move  down  by  d z.  If  we  want 
to  predict  how  much  the  rightmost 
clown  will  move  in  response  to  a 
certain  amount  of  motion  by  the 
leftmost  one,  we  have 

d z  dz  d y 

da:  d  y  da; 

This  is  called  the  chain  rule.  It 
says  that  if  a  change  in  x  causes  y 
to  change,  and  y  then  causes  z  to 
change,  then  this  chain  of  changes 
has  a  cascading  effect.  Mathemat¬ 
ically,  there  is  no  big  mystery  here. 
We  simply  cancel  d y  on  the  top 


and  bottom.  The  only  minor  sub¬ 
tlety  is  that  we  would  like  to  be 
able  to  be  sloppy  by  using  an  ex¬ 
pression  like  dy/dx  to  mean  both 
the  quotient  of  two  infinitesimal 
numbers  and  a  derivative,  which  is 
defined  as  the  standard  part  of  this 
quotient.  This  sloppiness  turns  out 
to  be  all  right,  as  proved  on  page 
151. 

Example  17 

t>  Jane  hikes  3  kilometers  in  an  hour, 
and  hiking  burns  70  calories  per  kilo¬ 
meter.  At  what  rate  does  she  burn 
calories? 

>  We  let  x  be  the  number  of  hours 
she’s  spent  hiking  so  far,  y  the  dis¬ 
tance  covered,  and  z  the  calories 
spent.  Then 


dz 

/  70  cal\ 

(  3krff\ 

dx  “ 

V  1  kr rf  ) 

V  1  hr  ) 

=  210  cal/hr. 


Example  18 

o  Figure  j  shows  a  piece  of  farm 
equipment  containing  a  train  of  gears 
with  13,21,  and  42  teeth.  If  the  small¬ 
est  gear  is  driven  by  a  motor,  relate 
the  rate  of  rotation  of  the  biggest  gear 
to  the  rate  of  rotation  of  the  motor. 

o  Let  x,  y,  and  z  be  the  angular  posi¬ 
tions  of  the  three  gears.  Then  by  the 
chain  rule, 

dz  _  dz  dy 
dx  ""  dy  dx 
_  13  21 
“  2?  '  42 

_  I3 

“  42' 
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i  /  Three  clowns  on  seesaws  demonstrate  the  chain  rule. 


j  /  Example  18. 


The  chain  rule  lets  us  find  the 
derivative  of  a  function  that  has 
been  built  out  of  one  function  stuck 
inside  another. 


Example  19 

>  Find  the  derivative  of  the  function 
z(x)  =  sin(x2). 

t>  Let  y(x)  =  x2,  so  that  z(x)  = 


sin(y(x)).  Then 

dz  _  dz  dy 
dx  ""  dy  dx 
=  cos(y)  ■  2x 
=  2xcos(x2) 

The  way  people  usually  say  it  is  that 
the  chain  rule  tells  you  to  take  the 
derivative  of  the  outside  function,  the 
sine  in  this  case,  and  then  multiply 
by  the  derivative  of  “the  inside  stuff,” 
which  here  is  the  square.  Once  you 
get  used  to  doing  it,  you  don’t  need 
to  invent  a  third,  intermediate  variable, 
as  we  did  here  with  y. 

Example  20 

Let’s  express  the  chain  rule  without 
the  use  of  the  Leibniz  notation.  Let  the 
function  1  be  defined  by  f(x)  =  g(h{x)). 
Then  the  derivative  of  f  is  given  by 
f’(x)  =  g'(h(x))  ■  h’(x). 

Example  21 

>  We’ve  already  proved  that  the 
derivative  of  tk  is  for  k  =  -1  (ex¬ 
ample  1 0  on  p.  27)  and  for  k  =  1 , 2,  3, 
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. . .  (p.  140).  Use  these  facts  to  extend 
the  rule  to  all  integer  values  of  k. 

>  For  k  <  0,  the  function  x  -  tk  can 
be  written  as  x  =  (t~')~k,  where  —k  is 
positive.  Applying  the  chain  rule,  we 
find  dx/df  =  (-k)(r')-k~\-r2)  = 
ktk-\ 

2.5  Exponentials  and 
logarithms 

The  exponential 

The  exponential  function  ex, 
where  e  =  2.71828 ...  is  the  base 
of  natural  logarithms,  comes 
constantly  up  in  applications  as 
diverse  as  credit-card  interest,  the 
growth  of  animal  populations,  and 
electric  circuits.  For  its  derivative 
we  have 

dex 
dx 


The  second  factor,  (edx  —  l)  /  dx, 
doesn’t  have  x  in  it,  so  it  must 
just  be  a  constant.  Therefore  we 
know  that  the  derivative  of  ex  is 
simply  ex ,  multiplied  by  some  un¬ 
known  constant, 


A  rough  check  by  graphing  at,  say 
x  =  0,  shows  that  the  slope  is  close 
to  1,  so  c  is  close  to  1.  Numer¬ 
ical  calculation  also  shows  that, 


ex+dx  _  ex 

dx 

exedx  _  ex 

dx 

edx  -  1 


=  e 


dx 


for  example,  (e0  001  —  1) /0.001  = 
1.00050016670838  is  very  close  to 
1.  But  how  do  we  know  it’s  exactly 
one  when  dx  is  really  infinitesimal? 
We  can  use  Inf: 


[exp(d)-l]/d 
1+0. 5d+.  . . 


(The  ...  indicates  where  I’ve 
snipped  some  lrigher-order  terms 
out  of  the  output.)  It  seems  clear 
that  c  is  equal  to  1  except  for  neg¬ 
ligible  terms  involving  higher  pow¬ 
ers  of  dx.  A  rigorous  proof  is  given 
on  page  151. 

Example  22 

>  The  concentration  of  a  foreign  sub¬ 
stance  in  the  bloodstream  generally 
falls  off  exponentially  with  time  as  c  = 
Coe~,/a,  where  c0  is  the  initial  concen¬ 
tration,  and  a  is  a  constant.  For  caf¬ 
feine  in  adults,  a  is  typically  about  7 
hours.  An  example  is  shown  in  figure 
k.  Differentiate  the  concentration  with 
respect  to  time,  and  interpret  the  re¬ 
sult.  Check  that  the  units  of  the  result 
make  sense. 

>  Using  the  chain  rule, 


dc 

df  =  CoS 


-t/a 


Cog-t/a 

a 


This  can  be  interpreted  as  the  rate 
at  which  caffeine  is  being  removed 
from  the  blood  and  broken  down  by 
the  liver.  It’s  negative  because  the 
concentration  is  decreasing.  Accord¬ 
ing  to  the  original  expression  for  x, 
a  substance  with  a  large  a  will  take 
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a  long  time  to  reduce  its  concentra¬ 
tion,  since  t/a  won’t  be  very  big  un¬ 
less  we  have  large  t  on  top  to  com¬ 
pensate  for  the  large  a  on  the  bottom. 
In  other  words,  larger  values  of  a  rep¬ 
resent  substances  that  the  body  has 
a  harder  time  getting  rid  of  efficiently. 
The  derivative  has  a  on  the  bottom, 
and  the  interpretation  of  this  is  that  for 
a  drug  that  is  hard  to  eliminate,  the 
rate  at  which  it  is  removed  from  the 
blood  is  low. 

It  makes  sense  that  a  has  units  of 
time,  because  the  exponential  func¬ 
tion  has  to  have  a  unitless  argument, 
so  the  units  of  t/a  have  to  cancel  out. 
The  units  of  the  result  come  from  the 
factor  of  Co/a,  and  it  makes  sense  that 
the  units  are  concentration  divided  by 
time,  because  the  result  represents 
the  rate  at  which  the  concentration  is 
changing. 


k  /  Example  22.  A  typ¬ 
ical  graph  of  the  con¬ 
centration  of  caffeine  in 
the  blood,  in  units  of  mil¬ 
ligrams  per  liter,  as  a 
function  of  time,  in  hours. 

Example  23 

>  Find  the  derivative  of  the  function 
y  =  10*. 


t>  In  general,  one  of  the  tricks  to  do¬ 
ing  calculus  is  to  rewrite  functions  in 
forms  that  you  know  how  to  handle. 
This  one  can  be  rewritten  as  a  base-e 
exponent: 

y  =  10* 

Iny  =  In  (10X) 

Iny  =  xln  10 


Applying  the  chain  rule,  we  have  the 
derivative  of  the  exponential,  which  is 
just  the  same  exponential,  multiplied 
by  the  derivative  of  the  inside  stuff: 

^  =e*ln10-ln10. 

dx 

In  other  words,  the  “c”  referred  to  in 
the  discussion  of  the  derivative  of  e* 
becomes  c  =  In  10  in  the  case  of  the 
base-10  exponential. 


The  logarithm 

The  natural  logarithm  is  the  func¬ 
tion  that  undoes  the  exponential. 
In  a  situation  like  this,  we  have 

dy  _  1 

da;  dx/ dy' 

where  on  the  left  we’re  thinking  of 
y  as  a  function  of  x,  and  on  the 
right  we  consider  a;  to  be  a  function 
of  y.  Applying  this  to  the  natural 
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logarithm, 


y 

X 

dx 
d  y 
dy 
dx 


lnx 


1 

ev 

1 


x 

dlnx  1 

dx  x 


later.  The  proof  is  example  24  be¬ 
low.)  The  integral  of  x_1  is  not 
x°/0,  which  wouldn’t  make  sense 
anyway  because  it  involves  divi¬ 
sion  by  zero.5  Likewise  the  deriva¬ 
tive  of  x°  =  1  is  Ox-1,  which  is 
zero.  Figure  1  shows  the  idea.  The 
functions  x”  form  a  kind  of  ladder, 
with  differentiation  taking  us  down 
one  rung,  and  integration  taking  us 
up.  However,  there  are  two  special 
cases  where  differentiation  takes  us 
off  the  ladder  entirely. 


In  x 

4  integration 


I  /  Differentiation  and  integration  of 
functions  of  the  form  xn.  Constants 
out  in  front  of  the  functions  are  not 
shown,  so  keep  in  mind  that,  for  ex¬ 
ample,  the  derivative  of  x2  isn’t  x,  it’s 
2x. 

This  is  noteworthy  because  it 
shows  that  there  must  be  an  ex¬ 
ception  to  the  rule  that  the  deriva¬ 
tive  of  xn  is  nx"-1,  and  the  inte¬ 
gral  of  xn_1  is  xn/n.  (On  page 
37  I  remarked  that  this  rule  could 
be  proved  using  the  product  rule 
for  negative  integer  values  of  k, 
but  that  I  would  give  a  simpler, 
less  tricky,  and  more  general  proof 


Example  24 

>  Prove  d(xn)/  dx  =  nxn_1  for  any  real 
value  of  n,  not  just  an  integer. 

> 

y  =  xn 

_  qH  lnx 

By  the  chain  rule, 

dy  _  nmx  n 

dx  x 

n  n 
=  x  •  - 

X 


5  Speaking  casually,  one  can  say  that 
division  by  zero  gives  infinity.  This  is 
often  a  good  way  to  think  when  try¬ 
ing  to  connect  mathematics  to  reality. 
However,  it  doesn’t  really  work  that  way 
according  to  our  rigorous  treatment  of 
the  hyperreals.  Consider  this  statement: 
“For  a  nonzero  real  number  a,  there  is 
no  real  number  b  such  that  a  =  06.”  This 
means  that  we  can’t  divide  a  by  0  and  get 
b.  Applying  the  transfer  principle  to  this 
statement,  we  see  that  the  same  is  true 
for  the  hyperreals:  division  by  zero  is  un¬ 
defined.  However,  we  can  divide  a  finite 
number  by  an  infinitesimal,  and  get  an 
infinite  result,  which  is  almost  the  same 
thing. 
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(For  n  =  0,  the  result  is  zero.) 

When  I  started  the  discussion  of 
the  derivative  of  the  logarithm,  I 
wrote  y  =  In  a’  right  off  the  bat. 
That  meant  I  was  implicitly  as¬ 
suming  x  was  positive.  More  gen¬ 
erally,  the  derivative  of  In  \x\  equals 
1/x,  regardless  of  the  sign  (see 
problem  29  on  page  50). 

2.6  Quotients 

So  far  we’ve  been  successful  with 
a  divide-and-conquer  approach  to 
differentiation:  the  product  rule 
and  the  chain  rule  offer  meth¬ 
ods  of  breaking  a  function  down 
into  simpler  parts,  and  finding  the 
derivative  of  the  whole  thing  based 
on  knowledge  of  the  derivatives  of 
the  parts.  We  know  how  to  find 
the  derivatives  of  sums,  differences, 
and  products,  so  the  obvious  next 
step  is  to  look  for  a  way  of  handling 
division.  This  is  straightforward, 
since  we  know  that  the  derivative 
of  the  function  1/u  =  u_1  is  — tt~2. 
Let  u  and  v  be  functions  of  x. 
Then  by  the  product  rule, 

d (v/u)  dv  1  d(l/ x<.) 

da:  dx  u  dx 

and  by  the  chain  rule, 

d (v/u)  dv  1  1  dw 

dx  dx  u  u2  dx 

This  is  so  easy  to  rederive  on  de¬ 
mand  that  I  suggest  not  memoriz¬ 
ing  it. 

By  the  way,  notice  how  the  no¬ 
tation  becomes  a  little  awkward 


when  we  want  to  write  a  derivative 
like  d {v/u)/ dx.  When  we’re  differ¬ 
entiating  a  complicated  function, 
it  can  be  uncomfortable  trying  to 
cram  the  expression  into  the  top  of 
the  d. . .  /  d. . .  fraction.  Therefore 
it  would  be  more  common  to  write 
such  an  expression  like  this: 


This  could  be  considered  an  abuse 
of  notation,  making  d  look  like  a 
number  being  divided  by  another 
number  dx,  when  actually  d  is 
meaningless  on  its  own.  On  the 
other  hand,  we  can  consider  the 
symbol  d/dx  to  represent  the  op¬ 
eration  of  differentiation  with  re¬ 
spect  to  x;  such  an  interpretation 
will  seem  more  natural  to  those 
who  have  been  inculcated  with  the 
taboo  against  considering  infinites¬ 
imals  as  numbers  in  the  first  place. 

Using  the  new  notation,  the  quo¬ 
tient  rule  becomes 

d  /  v  \  1  du  v  dit 
dx  \uJ  u  dx  u 2  dx 

The  interpretation  of  the  minus 
sign  is  that  if  u  increases,  v/u  de¬ 
creases. 

Example  25 

t>  Differentiate  y  =  x/(1  +  3x),  and 
check  that  the  result  makes  sense. 

o  We  identify  v  with  x  and  u  with  1  +x. 


The  result  is 

d  sv\ 

1  dv 

V 

du 

dx  \  uJ 

u  dx 

~  u 2 

'  dx 

1 

3x 

1  +3x 

'  CT 

+  3x)2 
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One  way  to  check  that  the  result 
makes  sense  is  to  consider  extreme 
values  of  x.  For  very  large  values  of  x, 
the  1  on  the  bottom  of  x/(1  +  3x)  be¬ 
comes  negligible  compared  to  the  3x, 
and  the  function  y  approaches  x/3x  = 
1/3  as  a  limit.  Therefore  we  expect 
that  the  derivative  dy/dx  should  ap¬ 
proach  zero,  since  the  derivative  of 
a  constant  is  zero.  It  works:  plug¬ 
ging  in  bigger  and  bigger  numbers  for 
x  in  the  expression  for  the  derivative 
does  give  smaller  and  smaller  results. 
(In  the  second  term,  the  denominator 
gets  bigger  faster  than  the  numerator, 
because  it  has  a  square  in  it.) 

Another  way  to  check  the  result  is  to 
verify  that  the  units  work  out.  Sup¬ 
pose  arbitrarily  that  x  has  units  of  gal¬ 
lons.  (If  the  3  on  the  bottom  is  unitless, 
then  the  1  would  have  to  represent  1 
gallon,  since  you  can’t  add  things  that 
have  different  units.)  The  function  y  is 
defined  by  an  expression  with  units  of 
gallons  divided  by  gallons,  so  y  is  unit¬ 
less.  Therefore  the  derivative  dy/dx 
should  have  units  of  inverse  gallons. 
Both  terms  in  the  expression  for  the 
derivative  do  have  those  units,  so  the 
units  of  the  answer  check  out. 

2.7  Differentiation  on 
a  computer 

In  this  chapter  you’ve  learned  a  set 
of  rules  for  evaluating  derivatives: 
derivatives  of  products,  quotients, 
functions  inside  other  functions, 
etc.  Because  these  rules  exist, 
it’s  always  possible  to  find  a 
formula  for  a  function’s  derivative, 
given  the  formula  for  the  original 
function.  Not  only  that,  but  there 


is  no  real  creativity  required,  so  a 
computer  can  be  programmed  to 
do  all  the  drudgery.  For  example, 
you  can  download  a  free,  open- 
source  program  called  Yacas  from 
yacas.sourceforge.net  and 
install  it  on  a  Windows  or  Linux 
machine.  There  is  even  a  version 
you  can  run  in  a  web  browser  with¬ 
out  installing  any  special  software: 
http : //yacas . sourcef orge .net/ 
yacasconsole.html 
A  typical  session  with  Yacas  looks 
like  this: 

Example  26 

D(x)  x"2 
2*  x 

D(x)  Exp(x~2) 

2*x*Exp  (x  ~2) 

D(x)  Sin(Cos (Sin(x) ) ) 

-Cos  (x)  *Sin(Sin  (x) ) 

*Cos  (Cos  (Sin(x) ) ) 

Upright  type  represents  your  in¬ 
put,  and  italicized  type  is  the  pro¬ 
gram’s  output. 

First  I  asked  it  to  differentiate  x 2 
with  respect  to  x,  and  it  told  me 
the  result  was  2x.  Then  I  did 
the  derivative  of  ex  ,  which  I  also 
could  have  done  fairly  easily  by 
hand.  (If  you’re  trying  this  out 
on  a  computer  as  you  read  along, 
make  sure  to  capitalize  functions 
like  Exp,  Sin,  and  Cos.)  Finally 
I  tried  an  example  where  I  didn’t 
know  the  answer  off  the  top  of  my 
head,  and  that  would  have  been  a 
little  tedious  to  calculate  by  hand. 

Unfortunately  things  are  a  little 
less  rosy  in  the  world  of  integrals. 
There  are  a  few  rules  that  can  help 
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you  do  integrals,  e.g.,  that  the  inte¬ 
gral  of  a  sum  equals  the  sum  of  the 
integrals,  but  the  rules  don’t  cover 
all  the  possible  cases.  Using  Ya- 
cas  to  evaluate  the  integrals  of  the 
same  functions,  here’s  what  hap¬ 
pens.6 

Example  27 

Integrate (x)  x~2 
a; '3/3 

Integrate (x)  Exp(x'2) 
Integrate! x) Exp  (x ~2) 
Integrate (x) 

Sin(Cos(Sin(x))) 
Integrate! x) 

Sin!Cos  !Sin!x) ) ) 

The  first  one  works  fine,  and  I 
can  easily  verify  that  the  answer 
is  correct,  by  taking  the  derivative 
of  x3/3 ,  which  is  x2 .  (The  an¬ 
swer  could  have  been  x3/3  +  7,  or 
a;3/3+c,  where  c  was  any  constant, 
but  Yacas  doesn’t  bother  to  tell  us 
that.)  The  second  and  third  ones 
don’t  work,  however;  Yacas  just 
spits  back  the  input  at  us  without 
making  any  progress  on  it.  And 
it  may  not  be  because  Yacas  isn’t 
smart  enough  to  figure  out  these 
integrals.  The  function  ex  can’t 
be  integrated  at  all  in  terms  of  a 
formula  containing  ordinary  oper¬ 
ations  and  functions  such  as  ad¬ 
dition,  multiplication,  exponentia¬ 
tion,  trig  functions,  exponentials, 
and  so  on. 

6If  you’re  trying  these  on  your  own 
computer,  note  that  the  long  input  line 
for  the  function  sin  cos  sin  a:  shouldn’t  be 
broken  up  into  two  lines  as  shown  in  the 
listing. 


That’s  not  to  say  that  a  program 
like  this  is  useless.  For  example, 
here’s  an  integral  that  I  wouldn’t 
have  known  how  to  do,  but  that 
Yacas  handles  easily: 

Example  28 

Integrate (x)  Sin(Ln(x)) 
!x*Sin!Ln!x)  )  )/2 

-  !x*Co s  !Ln  !x)  )  ) /2 

This  one  is  easy  to  check  by  dif¬ 
ferentiating,  but  I  could  have  been 
marooned  on  a  desert  island  for  a 
decade  before  I  could  have  figured 
it  out  in  the  first  place.  There  are 
various  rules,  then,  for  integration, 
but  they  don’t  cover  all  possible 
cases  as  the  rules  for  differentiation 
do,  and  sometimes  it  isn’t  obvious 
which  rule  to  apply.  Yacas’s  ability 
to  integrate  sin  In  a:  shows  that  it 
had  a  rule  in  its  bag  of  tricks  that 
I  don’t  know,  or  didn’t  remember, 
or  didn’t  realize  applied  to  this  in¬ 
tegral. 

Back  in  the  17th  century,  when 
Newton  and  Leibniz  invented  cal¬ 
culus,  there  were  no  computers,  so 
it  was  a  big  deal  to  be  able  to  find 
a  simple  formula  for  your  result. 
Nowadays,  however,  it  may  not  be 
such  a  big  deal.  Suppose  I  want  to 
find  the  derivative  of  sin  cos  sin  a:, 
evaluated  at  x  =  1.  I  can  do  some¬ 
thing  like  this  on  a  calculator: 

Example  29 

sin  cos  sin  1  = 

0.61813407 
sin  cos  sin  1.0001  = 

0.61810240 

(0.61810240-0.61813407) 


2.7.  DIFFERENTIATION  ON  A  COMPUTER 
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/.oooi  = 

-0.3167 

I  have  the  right  answer,  with 
plenty  of  precision  for  most  realis¬ 
tic  applications,  although  I  might 
have  never  guessed  that  the  myste¬ 
rious  number  —0.3167  was  actually 
—  (cos  1 )  (sin  sin  1 )  (cos  cos  sin  1 ) . 
This  could  get  a  little  tedious  if  I 
wanted  to  graph  the  function,  for 
instance,  but  then  I  could  just  use 
a  computer  spreadsheet,  or  write 
a  little  computer  program.  In  this 
chapter,  I’m  going  to  show  you 
how  to  do  derivatives  and  integrals 
using  simple  computer  programs, 
using  Yacas.  The  following  little 
Yacas  program  does  the  same 
thing  as  the  set  of  calculator 
operations  shown  above: 

Example  30 

1  f (x) : =Sin(Cos (Sin(x) ) ) 

2  x:=l 

3  dx : = . 0001 

4  N(  (f (x+dx) -f (x) ) /dx  ) 

-0.3166671628 

(I’ve  omitted  all  of  Yacas’s  output 
except  for  the  final  result.)  Line 
1  defines  the  function  we  want  to 
differentiate.  Lines  2  and  3  give 
values  to  the  variables  x  and  dx. 
Line  4  computes  the  derivative;  the 
N  (  )  surrounding  the  whole  thing 
is  our  way  of  telling  Yacas  that  we 
want  an  approximate  numerical  re¬ 
sult,  rather  than  an  exact  symbolic 
one. 

An  interesting  thing  to  try  now  is 
to  make  dx  smaller  and  smaller, 
and  see  if  we  get  better  and  bet¬ 


ter  accuracy  in  our  approximation 
to  the  derivative. 

Example  31 

5  g(x,dx):= 

N(  (f (x+dx) -f (x) ) /dx  ) 

6  g(x,.l) 

-0. 3022356406 

7  g(x, .0001) 

-0.3166671628 

8  g(x, .0000001) 

-0.3160458019 

9  g(x, .00000000000000001) 

0 

Line  5  defines  the  derivative  func¬ 
tion.  It  needs  to  know  both  x  and 
dx.  Line  6  computes  the  derivative 
using  dx  =  0.1,  which  we  expect  to 
be  a  lousy  approximation,  since  dx 
is  really  supposed  to  be  infinitesi¬ 
mal,  and  0.1  isn’t  even  that  small. 
Line  7  does  it  with  the  same  value 
of  dx  we  used  earlier.  The  two  re¬ 
sults  agree  exactly  in  the  first  dec¬ 
imal  place,  and  approximately  in 
the  second,  so  we  can  be  pretty 
sure  that  the  derivative  is  —0.32 
to  two  figures  of  precision.  Line 
8  ups  the  ante,  and  produces  a  re¬ 
sult  that  looks  accurate  to  at  least 
3  decimal  places.  Line  9  attempts 
to  produce  fantastic  precision  by 
using  an  extremely  small  value  of 
dx.  Oops  —  the  result  isn’t  bet¬ 
ter,  it’s  worse!  What’s  happened 
here  is  that  Yacas  computed  f{x) 
and  f(x  +  dx),  but  they  were  the 
same  to  within  the  precision  it  was 
using,  so  /(x  +  dx)  — /(x)  rounded 
off  to  zero. ' 

7Yacas  can  do  arithmetic  to  any 
precision  you  like,  although  you  may 
run  into  practical  limits  due  to  the 
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Example  31  demonstrates  the  con¬ 
cept  of  how  a  derivative  can  be  de¬ 
fined  in  terms  of  a  limit: 


dy 

dx 


r  Ay 

inn  — — 

Ai->0  Ax 


The  idea  of  the  limit  is  that  we 
can  theoretically  make  Ay/ Ax  ap¬ 
proach  as  close  as  we  like  to  dy/  dx, 
provided  we  make  Ax  sufficiently 
small.  In  reality,  of  course,  we 
eventually  run  into  the  limits  of 
our  ability  to  do  the  computation, 
as  in  the  bogus  result  generated  on 
line  9  of  the  example. 


amount  of  memory  your  computer  has 
and  the  speed  of  its  CPU.  For  fun, 
try  N (Pi,  1000),  which  tells  Yacas  to 
compute  7 r  numerically  to  1000  decimal 
places. 


PROBLEMS 
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Problems 

1  Carry  out  a  calculation  like 

the  one  in  example  9  on  page  26 
to  show  that  the  derivative  of  t4 
equals  4 13.  >  Solution,  p.  173 

2  Example  12  on  page  29  gave 
a  tricky  argument  to  show  that  the 
derivative  of  cos  t  is  —  sin  t.  Prove 
the  same  result  using  the  method 
of  example  11  instead. 

>  Solution,  p.  174 

3  Suppose  H  is  a  big  number. 

Experiment  on  a  calculator  to  fig¬ 
ure  out  whether  \/H  +  1  —  \JH  —  1 
comes  out  big,  normal,  or  tiny.  Try 
making  H  bigger  and  bigger,  and 
see  if  you  observe  a  trend.  Based 
on  these  numerical  examples,  form 
a  conjecture  about  what  happens 
to  this  expression  when  H  is  infi¬ 
nite.  t>  Solution,  p.  174 

4  Suppose  da:  is  a  small  but 

finite  number.  Experiment  on  a 
calculator  to  figure  out  how  \fdx 
compares  in  size  to  da:.  Try  mak¬ 
ing  da:  smaller  and  smaller,  and 
see  if  you  observe  a  trend.  Based 
on  these  numerical  examples,  form 
a  conjecture  about  what  happens 
to  this  expression  when  da;  is  in¬ 
finitesimal.  I>  Solution,  p.  174 

5  To  which  of  the  following 
statements  can  the  transfer  prin¬ 
ciple  be  applied?  If  you  think  it 
can’t  be  applied  to  a  certain  state¬ 
ment,  try  to  prove  that  the  state¬ 
ment  is  false  for  the  hyperreals, 
e.g.,  by  giving  a  counterexample. 


(a)  For  any  real  numbers  x  and  y, 
x  +  y  =  y  +  x. 

(b)  The  sine  of  any  real  number  is 
between  —1  and  1. 

(c)  For  any  real  number  x,  there 
exists  another  real  number  y  that 
is  greater  than  x. 

(d)  For  any  real  numbers  x  ^  y, 
there  exists  another  real  number  z 
such  that  x  <  z  <  y. 

(e)  For  any  real  numbers  x  ^  y, 
there  exists  a  rational  number  z 
such  that  x  <  z  <  y.  (A  ratio¬ 
nal  number  is  one  that  can  be  ex¬ 
pressed  as  an  integer  divided  by 
another  integer.) 

(f)  For  any  real  numbers  x,  y,  and 
z,  (x  +  y)  +  z  =  x  +  (y  +  z). 

(g)  For  any  real  numbers  x  and  y, 
either  x  <  y  or  x  =  y  or  x  >  y. 

(h)  For  any  real  number  x,  x  +  1  ^ 

X.  >  Solution,  p.  175 

6  If  we  want  to  pump  air 
or  water  through  a  pipe,  com¬ 
mon  sense  tells  us  that  it  will  be 
easier  to  move  a  larger  quantity 
more  quickly  through  a  fatter  pipe. 
Quantitatively,  we  can  define  the 
resistance,  R,  which  is  the  ratio 
of  the  pressure  difference  produced 
by  the  pump  to  the  rate  of  flow. 
A  fatter  pipe  will  have  a  lower  re¬ 
sistance.  Two  pipes  can  be  used 
in  parallel,  for  instance  when  you 
turn  on  the  water  both  in  the 
kitchen  and  in  the  bathroom,  and 
in  this  situation,  the  two  pipes  let 
more  water  flow  than  either  would 
have  let  flow  by  itself,  which  tells 
us  that  they  act  like  a  single  pipe 
with  some  lower  resistance.  The 
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equation  for  their  combined  resis¬ 
tance  is  R  =  l/(l/i?i  +  1  / R2 ) • 
Analyze  the  case  where  one  resis¬ 
tance  is  finite,  and  the  other  infi¬ 
nite,  and  give  a  physical  interpre¬ 
tation.  Likewise,  discuss  the  case 
where  one  is  finite,  but  the  other  is 
infinitesimal. 

[>  Solution,  p.  175 

7  Naively,  we  would  imagine 
that  if  a  spaceship  traveling  at  u  = 
3/4  of  the  speed  of  light  was  to 
shoot  a  missile  in  the  forward  di¬ 
rection  at  v  =  3/4  of  the  speed 
of  light  (relative  to  the  ship) ,  then 
the  missile  would  be  traveling  at 
u  +  v  =  3/2  of  the  speed  of  light. 
However,  Einstein’s  theory  of  rela¬ 
tivity  tells  us  that  this  is  too  good 
to  be  true,  because  nothing  can  go 
faster  than  light.  In  fact,  the  rela¬ 
tivistic  equation  for  combining  ve¬ 
locities  in  this  way  is  not  u  +  v,  but 
rather  (it  +  v)/(l  +  uv).  In  ordi¬ 
nary,  everyday  life,  we  never  travel 
at  speeds  anywhere  near  the  speed 
of  light.  Show  that  the  nonrela- 
tivistic  result  is  recovered  in  the 
case  where  both  u  and  v  are  in¬ 
finitesimal.  >  Solution,  p.  175 

8  Differentiate  (2x  +  3) 100  with 
respect  to  X.  t>  Solution,  p.  175 

9  Differentiate  ( x  +  l)100(a:  + 
2)200  with  respect  to  x. 

>  Solution,  p.  176 

10  Differentiate  the  following 
with  respect  to  x:  e7x ,  ee  .  (In 
the  latter  expression,  as  in  all  ex¬ 
ponentials  nested  inside  exponen¬ 
tials,  the  evaluation  proceeds  from 


the  top  down,  i.e.,  not  ( ee)x .) 

>  Solution,  p.  176 

11  Differentiate  asin(6a:  +  c) 
with  respect  to  x. 

>  Solution,  p.  176 

12  Let  x  =  tp/q,  where  p  and 

q  are  positive  integers.  By  a  tech¬ 
nique  similar  to  the  one  in  exam¬ 
ple  21  on  p.  38,  prove  that  the  dif¬ 
ferentiation  rule  for  tk  holds  when 
k  =  p/q. qwe  I>  Solution,  p.  ?? 

13  Find  a  function  whose 
derivative  with  respect  to  x  equals 
asin(6a:  +  c).  That  is,  find  an  inte¬ 
gral  of  the  given  function. 

>  Solution,  p.  176 

14  Use  the  chain  rule  to  differ¬ 
entiate  ((a;2)2)2,  and  show  that  you 
get  the  same  result  you  would  have 
obtained  by  differentiating  xs. 

C>  Solution,  p.  176  [M.  Livshits] 

15  The  range  of  a  gun,  when 
elevated  to  an  angle  6 ,  is  given  by 

2v2 

R  = - sin  9  cos  6. 

9 

Find  the  angle  that  will  produce 
the  maximum  range. 

>  Solution,  p.  177 

16  Differentiate  sin  cos  tan  a? 
with  respect  to  x. 

17  The  hyperbolic  cosine  func¬ 
tion  is  defined  by 

,  ex  +  e~x 

cosh  x  =  - . 

2 

Find  any  minima  and  maxima  of 
this  function. 


>  Solution,  p.  177 


PROBLEMS 
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18  Show  that  the  function 

sin(sin(sina;))  has  maxima  and 
minima  at  all  the  same  places 
where  sin  a;  does,  and  at  no  other 
places.  >  Solution,  p.  177 

19  Let  f(x)  =  \x\+x  and  g(x)  = 
x\x\  +  x.  Find  the  derivatives  of 
these  functions  at  x  =  0  in  terms 
of  (a)  slopes  of  tangent  lines  and 

(b)  infinitesimals. 

>  Solution,  p.  178 

20  In  free  fall,  the  acceleration 
will  not  be  exactly  constant,  due 
to  air  resistance.  For  example,  a 
skydiver  does  not  speed  up  indefi¬ 
nitely  until  opening  her  chute,  but 
rather  approaches  a  certain  maxi¬ 
mum  velocity  at  which  the  upward 
force  of  air  resistance  cancels  out 
the  force  of  gravity.  The  expres¬ 
sion  for  the  distance  dropped  by  of 
a  free-falling  object,  with  air  resis¬ 
tance,  is8 


d  =  A  In 


where  g  is  the  acceleration  the  ob¬ 
ject  would  have  without  air  resis¬ 
tance,  the  function  cosh  has  been 
defined  in  problem  17,  and  A  is  a 
constant  that  depends  on  the  size, 
shape,  and  mass  of  the  object,  and 
the  density  of  the  air.  (For  a  sphere 
of  mass  mn  and  diameter  d  dropping 
in  air,  A  =  4.11m /d2.  Cf.  problem 
10,  p.  115.) 

(a)  Differentiate  this  expression  to 
find  the  velocity.  Hint:  In  order  to 

8  Jan  Benacka  and  Igor  Stubna,  The 
Physics  Teacher,  43  (2005)  432. 


simplify  the  writing,  start  by  defin¬ 
ing  some  other  symbol  to  stand  for 
the  constant  y/gjA. 

(b)  Show  that  your  answer  can  be 
reexpressed  in  terms  of  the  func¬ 
tion  tanlr  defined  by  tanh  x  =  {ex  — 
e~x)/{ex  +  e~x). 

(c)  Show  that  your  result  for  the 
velocity  approaches  a  constant  for 
large  values  of  t. 

(d)  Check  that  your  answers  to 
parts  b  and  c  have  units  of  velocity. 

>  Solution,  p.  179 

21  Differentiate  tan  9  with  re¬ 
spect  to  9.  >  Solution,  p.  179 

22  Differentiate  yfx  with  re¬ 
spect  to  X.  >  Solution,  p.  179 

23  Differentiate  the  following 
with  respect  to  x : 

(a)  y  =  Vx2  +  1 

(b)  y  =  y/x2  +  a2 

(c)  y  =  1/Vlj+p 

(d)  y  =  a/y/ a  —  x2 


>  Solution,  p.  179  [Thompson,  1919] 

24  Differentiate  ln(2i  +  1)  with 
respect  to  t.  >  Solution,  p.  180 

25  If  you  know  the  derivative  of 

sinx,  it’s  not  necessary  to  use  the 
product  rule  in  order  to  differenti¬ 
ate  3  sin  x,  but  show  that  using  the 
product  rule  gives  the  right  result 
anyway.  >  Solution,  p.  180 

26  The  r  function  (capital 
Greek  letter  gamma)  is  a  contin¬ 
uous  mathematical  function  that 
has  the  property  T(n)  =  1  •  2  • 

. . .  •  (n  —  1)  for  n  an  integer.  T(x) 
is  also  well  defined  for  values  of  x 
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that  are  not  integers,  e.g.,  T(l/2) 
happens  to  be  ypK.  Use  computer 
software  that  is  capable  of  evalu¬ 
ating  the  r  function  to  determine 
numerically  the  derivative  of  r(x) 
with  respect  to  x,  at  x  =  2.  (In  Ya- 
cas,  the  function  is  called  Gamma.) 

>  Solution,  p.  180 

27  For  a  cylinder  of  fixed 
surface  area,  what  proportion  of 
length  to  radius  will  give  the  max¬ 
imum  volume? 

>  Solution,  p.  180 

28  This  problem  is  a  varia¬ 
tion  on  problem  11  on  page  21. 
Einstein  found  that  the  equation 
K  =  (1/2 )mv2  for  kinetic  energy 
was  only  a  good  approximation  for 
speeds  much  less  than  the  speed  of 
light,  c.  At  speeds  comparable  to 
the  speed  of  light,  the  correct  equa¬ 
tion  is 

1  2 

}<  =  2mV 

y/l  —  V2/C2 

(a)  As  in  the  earlier,  simpler  prob¬ 
lem,  find  the  power  d I\  /  df  for 
an  object  accelerating  at  a  steady 
rate,  with  v  =  at. 

(b)  Check  that  your  answer  has  the 
right  units. 

(c)  Verify  that  the  power  required 

becomes  infinite  in  the  limit  as  v 
approaches  c,  the  speed  of  light. 
This  means  that  no  material  ob¬ 
ject  can  go  as  fast  as  the  speed  of 
light.  t>  Solution,  p.  181 

29  Prove,  as  claimed  on  page 

42,  that  the  derivative  of  In  |x| 
equals  l/x,  for  both  positive  and 
negative  X.  >  Solution,  p.  181 


30  An  even  function  is  one  with 

the  property  /(— x)  =  f{x).  For 
example,  cosx  is  an  even  func¬ 
tion,  and  xn  is  an  even  function 
if  n  is  even.  An  odd  function  has 
f{—x)  =  —f{x).  Prove  that  the 
derivative  of  an  even  function  is 
odd.  >  Solution,  p.  181 

31  average-minimizes-sum-of- 
squares  Suppose  we  have  a  list  of 
numbers  Xi, ...  xn,  and  we  wish  to 
find  some  number  q  that  is  as  close 
as  possible  to  as  many  of  the  Xi 
as  possible.  To  make  this  a  math¬ 
ematically  precise  goal,  we  need 
to  define  some  numerical  measure 
of  this  closeness.  Suppose  we  let 
h  =  (xi  -  q)2  +  . . .  +  (xn  -  q)2, 
which  can  also  be  notated  us¬ 
ing  E,  uppercase  Greek  sigma, 
as  h  =  X)/=1(xj  -  q)2.  Then 
minimizing  h  can  be  used  as  a 
definition  of  optimal  closeness. 
(Why  would  we  not  want  to  use 
h  =  Y^i=\  (xi  ~  <?)?)  Prove  that 
the  value  of  q  that  minimizes  h  is 
the  average  of  the  x*. 

32  Use  a  trick  similar  to  the  one 
used  in  example  16  to  prove  that 
the  power  rule  d(xfe)/dx  =  kxk~l 
applies  to  cases  where  k  is  an  inte¬ 
ger  less  than  0. 

>  Solution,  p.  182  * 

33  The  plane  of  Euclidean  ge¬ 
ometry  is  today  often  described 
as  the  set  of  all  coordinate  pairs 
( x,y ),  where  x  and  y  are  real.  We 
could  instead  imagine  the  plane  F 
that  is  defined  in  the  same  way,  but 
with  x  and  y  taken  from  the  set 
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of  hyperreal  numbers.  As  a  third 
alternative,  there  is  the  plane  G 
in  which  the  finite  hyperreals  are 
used.  In  E,  Euclid’s  parallel  postu¬ 
late  holds:  given  a  line  and  a  point 
not  on  the  line,  there  exists  ex¬ 
actly  one  line  passing  through  the 
point  that  does  not  intersect  the 
line.  Does  the  parallel  postulate 
hold  in  F?  In  G?  Is  it  valid  to  as¬ 
sociate  only  E  with  the  plane  de¬ 
scribed  by  Euclid’s  axioms? 

>  Solution,  p.  182  ★ 

34  Discuss  the  following  state¬ 
ment:  The  repeating  decimal 

0.999  ...  is  infinitesimally  less  than 
one.  >  Solution,  p.  182 

35  Example  20  on  page  38  ex¬ 
pressed  the  chain  rule  without  the 
Leibniz  notation,  writing  a  func¬ 
tion  /  defined  by  f{x)  =  g(h(x)). 
Suppose  that  you’re  trying  to  re¬ 
member  the  rule,  and  two  of  the 
possibilities  that  come  to  mind  are 
f(x)  =  g'(h{x))  and  f(x)  = 
g'(h(x))h(x).  Show  that  neither 
of  these  can  possibly  be  right,  by 
considering  the  case  where  x  has 
units.  You  may  find  it  helpful  to 
convert  both  expressions  back  into 
the  Leibniz  notation. 

>  Solution,  p.  183 

36  When  you  tune  in  a  radio 
station  using  an  old-fashioned  ro¬ 
tating  dial  you  don’t  have  to  be 
exactly  tuned  in  to  the  right  fre¬ 
quency  in  order  to  get  the  station. 
If  you  did,  the  tuning  would  be  in¬ 
finitely  sensitive,  and  you’d  never 
be  able  to  receive  any  signal  at  all! 


Instead,  the  tuning  has  a  certain 
amount  of  “slop”  intentionally  de¬ 
signed  into  it.  The  strength  of  the 
received  signal  s  can  be  expressed 
in  terms  of  the  dial’s  setting  /  by 
a  function  of  the  form 

1 

S~7WTJW+W' 

where  a,  6,  and  fQ  are  constants. 
This  functional  form  is  in  fact 
very  general,  and  is  encountered  in 
many  other  physical  contexts.  The 
graph  below  shows  the  resulting 
bell-shaped  curve.  Find  the  fre¬ 
quency  /  at  which  the  maximum 
response  occurs,  and  show  that  if  b 
is  small,  the  maximum  occurs  close 
to,  but  not  exactly  at,  f0. 

[>  Solution,  p.  183 


The  function  of  problem 
36,  with  a  =  3,  b  =  1,  and 
4  =  1. 


37  In  a  movie  theater,  the 
image  on  the  screen  is  formed  by 
a  lens  in  the  projector,  and  orig¬ 
inates  from  one  of  the  frames  on 
the  strip  of  celluloid  film  (or,  in  the 
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Problem  37.  A  set  of  light  rays  is  emitted  from  the  tip  of  the  glamorous  movie 
star’s  nose  on  the  film,  and  reunited  to  form  a  spot  on  the  screen  which  is  the 
image  of  the  same  point  on  his  nose.  The  distances  have  been  distorted  for 
clarity.  The  distance  y  represents  the  entire  length  of  the  theater  from  front  to 
back. 


newer  digital  projection  systems, 
from  a  liquid  crystal  chip) .  Let  the 
distance  from  the  film  to  the  lens 
be  x,  and  let  the  distance  from  the 
lens  to  the  screen  be  y.  The  pro¬ 
jectionist  needs  to  adjust  x  so  that 
it  is  properly  matched  with  y,  or 
else  the  image  will  be  out  of  focus. 
There  is  therefore  a  fixed  relation¬ 
ship  between  x  and  y,  and  this  re¬ 
lationship  is  of  the  form 

1  1  1 

— "  ~  ~e> 

x  y  f 

where  /  is  a  property  of  the  lens, 
called  its  focal  length.  A  stronger 
lens  has  a  shorter  focal  length. 
Since  the  theater  is  large,  and  the 
projector  is  relatively  small,  x  is 
much  less  than  y.  We  can  see 
from  the  equation  that  if  y  is  suffi¬ 
ciently  large,  the  left-hand  side  of 
the  equation  is  dominated  by  the 
1/x  term,  and  we  have  x  ~  /. 
Since  the  1/y  term  doesn’t  com¬ 
pletely  vanish,  we  must  have  x 
slightly  greater  than  /,  so  that  the 


1/x  term  is  slightly  less  than  1  //. 
Let  x  =  f  +  dx,  and  approximate 
dx  as  being  infinitesimally  small. 
Find  a  simple  expression  for  y  in 
terms  of  /  and  Ax. 

t>  Solution,  p.  184 

38  Why  might  the  expression 
1°°  be  considered  an  indeterminate 
form?  [>  Solution,  p.  185 


3  Limits  and  continuity 


3.1  Continuity 

Intuitively,  a  continuous  function 
is  one  whose  graph  has  no  sudden 
jumps  in  it;  the  graph  is  all  a  single 
connected  piece.  Such  a  function 
can  be  drawn  without  picking  the 
pen  up  off  of  the  paper.  Formally, 
a  function  f(x)  is  defined  to  be 
continuous  if  for  any  real  x  and  any 
infinitesimal  dx,  f(x  +  dx)  —  f(x) 
is  infinitesimal. 

Example  32 

Let  the  function  f  be  defined  by  f{x)  = 
0  for  x  <  0,  and  f(x)  =  1  for  x  >  0. 
Then  f(x)  is  discontinuous,  since  for 
dx  >  0,  f(0  +  dx)  —  f(0)  =  1,  which  isn’t 
infinitesimal. 


Q- 


a  /  Example  32.  The 
black  dot  indicates  that 
the  endpoint  of  the  lower 
ray  is  part  of  the  ray, 
while  the  white  one 
shows  the  contrary  for 
the  ray  on  the  top. 

If  a  function  is  discontinuous  at  a 

given  point,  then  it  is  not  differen¬ 


tiable  at  that  point.  On  the  other 
hand,  the  example  y  =  \x\  shows 
that  a  function  can  be  continuous 
without  being  differentiable. 


In  most  cases,  there  is  no  need 
to  invoke  the  definition  explicitly 
in  order  to  check  whether  a  func¬ 
tion  is  continuous.  Most  of  the 
functions  we  work  with  are  de¬ 
fined  by  putting  together  simpler 
functions  as  building  blocks.  For 
example,  let’s  say  we’re  already 
convinced  that  the  functions  de¬ 
fined  by  g(x)  =  3x  and  h(x)  = 
sinx  are  both  continuous.  Then  if 
we  encounter  the  function  /(x)  = 
sin(3x),  we  can  tell  that  it’s  con¬ 
tinuous  because  its  definition  cor¬ 
responds  to  /(x)  =  h(g(x)).  The 
functions  g  and  h  have  been  set 
up  like  a  bucket  brigade,  so  that 
g  takes  the  input,  calculates  the 
output,  and  then  hands  it  off  to 
h  for  the  final  step  of  the  calcu¬ 
lation.  This  method  of  combin¬ 
ing  functions  is  called  composition. 
The  composition  of  two  continuous 
functions  is  also  continuous.  Just 
watch  out  for  division.  The  func¬ 
tion  f(x)  =  1/x  is  continuous  ev¬ 
erywhere  except  at  x  =  0,  so  for 
example  1/ sin(x)  is  continuous  ev¬ 
erywhere  except  at  multiples  of  7r, 
where  the  sine  has  zeroes. 
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The  intermediate  value  theorem 

Another  way  of  thinking  about 
continuous  functions  is  given  by 
the  intermediate  value  theorem. 
Intuitively,  it  says  that  if  you  are 
moving  continuously  along  a  road, 
and  you  get  from  point  A  to  point 
B,  then  you  must  also  visit  every 
other  point  along  the  road;  only  by 
teleporting  (by  moving  discontin- 
uously)  could  you  avoid  doing  so. 
More  formally,  the  theorem  states 
that  if  y  is  a  continuous  real- valued 
function  on  the  real  interval  from  a 
to  b,  and  if  y  takes  on  values  7/1  and 
7/2  at  certain  points  within  this  in¬ 
terval,  then  for  any  7/3  between  7/1 
and  7/2,  there  is  some  real  x  in  the 
interval  for  which  y{x)  =  7/3. 


b  /  The  intermediate  value  theorem 
states  that  if  the  function  is  continu¬ 
ous,  it  must  pass  through  y3. 

The  intermediate  value  theorem 
seems  so  intuitively  appealing  that 
if  we  want  to  set  out  to  prove  it, 
we  may  feel  as  though  we’re  being 


asked  to  prove  a  proposition  such 
as,  “a  number  greater  than  10  ex¬ 
ists.”  If  a  friend  wanted  to  bet 
you  a  six-pack  that  you  couldn’t 
prove  this  with  complete  mathe¬ 
matical  rigor,  you  would  have  to 
get  your  friend  to  spell  out  very 
explicitly  what  she  thought  were 
the  facts  about  integers  that  you 
were  allowed  to  start  with  as  ini¬ 
tial  assumptions.  Are  you  allowed 
to  assume  that  1  exists?  Will  she 
grant  you  that  if  a  number  n  ex¬ 
ists,  so  does  n  +  1?  The  interme¬ 
diate  value  theorem  is  similar.  It’s 
stated  as  a  theorem  about  certain 
types  of  functions,  but  its  truth 
isn’t  so  much  a  matter  of  the  prop¬ 
erties  of  functions  as  the  properties 
of  the  underlying  number  system. 
For  the  reader  with  a  interest  in 
pure  mathematics,  I’ve  discussed 
this  in  more  detail  on  page  156  and 
given  an  abbreviated  proof.  (Most 
introductory  calculus  texts  do  not 
prove  it  at  all.) 

Example  33 

o  Show  that  there  is  a  solution  to  the 
equation  10x  +  x  =  1000. 

>  We  expect  there  to  be  a  solution 
near  x  =  3,  where  the  function  f{x)  = 
1 0*  +  x  =  1003  is  just  a  little  too  big. 
On  the  other  hand,  f{ 2)  =  102  is  much 
too  small.  Since  f  has  values  above 
and  below  1000  on  the  interval  from 
2  to  3,  and  f  is  continuous,  the  inter¬ 
mediate  value  theorem  proves  that  a 
solution  exists  between  2  and  3.  If  we 
wanted  to  find  a  better  numerical  ap¬ 
proximation  to  the  solution,  we  could 
do  it  using  Newton’s  method,  which  is 
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introduced  in  section  5.1. 

Example  34 

>  Show  that  there  is  at  least  one  so¬ 
lution  to  the  equation  cosx  =  x,  and 
give  bounds  on  its  location. 

>  This  is  a  transcendental  equation, 
and  no  amount  of  fiddling  with  alge¬ 
bra  and  trig  identities  will  ever  give  a 
closed-form  solution,  i.e.,  one  that  can 
be  written  down  with  a  finite  number  of 
arithmetic  operations  to  give  an  exact 
result.  However,  we  can  easily  prove 
that  at  least  one  solution  exists,  by 
applying  the  intermediate  value  theo¬ 
rem  to  the  function  x  -  cosx.  The 
cosine  function  is  bounded  between 
—  1  and  1,  so  this  function  must  be 
negative  for  x  <  -1  and  positive  for 
x  >  1 .  By  the  intermediate  value  the¬ 
orem,  there  must  be  a  solution  in  the 
interval  —1  <  x  <  1.  The  graph,  c, 
verifies  this,  and  shows  that  there  is 
only  one  solution. 


y 


c  /  The  function  x  —  cos  x 
constructed  in  example 
34. 

Example  35 

>  Prove  that  every  odd-order  polyno¬ 
mial  P  with  real  coefficients  has  at 


least  one  real  root  x,  i.e.,  a  point  at 
which  P(x)  =  0. 

t>  Example  34  might  have  given  the 
impression  that  there  was  nothing 
to  be  learned  from  the  intermediate 
value  theorem  that  couldn’t  be  deter¬ 
mined  by  graphing,  but  this  example 
clearly  can't  be  solved  by  graphing, 
because  we’re  trying  to  prove  a  gen¬ 
eral  result  for  all  polynomials. 

To  see  that  the  restriction  to  odd  or¬ 
ders  is  necessary,  consider  the  poly¬ 
nomial  x2  +  1 ,  which  has  no  real  roots 
because  x2  >  0  for  any  real  number 

x. 

To  fix  our  minds  on  a  concrete  ex¬ 
ample  for  the  odd  case,  consider  the 
polynomial  P(x)  =  x3  —  x  +  17.  For 
large  values  of  x,  the  linear  and  con¬ 
stant  terms  will  be  negligible  com¬ 
pared  to  the  x3  term,  and  since  x3 
is  positive  for  large  values  of  x  and 
negative  for  large  negative  ones,  it  fol¬ 
lows  that  P  is  sometimes  positive  and 
sometimes  negative. 

Making  this  argument  more  general 
and  rigorous,  suppose  we  had  a  poly¬ 
nomial  of  odd  order  n  that  always  had 
the  same  sign  for  real  x.  Then  by  the 
transfer  principle  the  same  would  hold 
for  any  hyperreal  value  of  x.  Now  if  x 
is  infinite  then  the  lower-order  terms 
are  infinitesimal  compared  to  the  xn 
term,  and  the  sign  of  the  result  is  de¬ 
termined  entirely  by  the  x"  term,  but 
xn  and  (-x)n  have  opposite  signs,  and 
therefore  P(x)  and  P(-x)  have  op¬ 
posite  signs.  This  is  a  contradiction, 
so  we  have  disproved  the  assumption 
that  P  always  had  the  same  sign  for 
real  x.  Since  P  is  sometimes  nega¬ 
tive  and  sometimes  positive,  we  con- 
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elude  by  the  intermediate  value  theo¬ 
rem  that  it  is  zero  somewhere. 

Example  36 

>  Show  that  the  equation  x  =  sin  1  /x 
has  infinitely  many  solutions. 

>  This  is  another  example  that  can’t 
be  solved  by  graphing;  there  is  clearly 
no  way  to  prove,  just  by  looking  at 
a  graph  like  d,  that  it  crosses  the  x 
axis  infinitely  many  times.  The  graph 
does,  however,  help  us  to  gain  intu¬ 
ition  for  what’s  going  on.  As  x  gets 
smaller  and  smaller,  1  /x  blows  up, 
and  sin  1  /x  oscillates  more  and  more 
rapidly.  The  function  f  is  undefined 
at  0,  but  it’s  continuous  everywhere 
else,  so  we  can  apply  the  intermedi¬ 
ate  value  theorem  to  any  interval  that 
doesn’t  include  0. 

We  want  to  prove  that  for  any  positive 
u ,  there  exists  an  x  with  0  <  x  <  u 
for  which  f(x)  has  either  desired  sign. 
Suppose  that  this  fails  for  some  real 
u.  Then  by  the  transfer  principle  the 
nonexistence  of  any  real  x  with  the  de¬ 
sired  property  also  implies  the  nonex¬ 
istence  of  any  such  hyperreal  x.  But 
for  an  infinitesimal  x  the  sign  of  f  is 
determined  entirely  by  the  sine  term, 
since  the  sine  term  is  finite  and  the  lin¬ 
ear  term  infinitesimal.  Clearly  sin  1  /x 
can’t  have  a  single  sign  for  all  values 
of  x  less  than  u,  so  this  is  a  contradic¬ 
tion,  and  the  proposition  succeeds  for 
any  u.  It  follows  from  the  intermediate 
value  theorem  that  there  are  infinitely 
many  solutions  to  the  equation. 


y 


d  /  The  function 

x  —  sin  1  /x. 

The  extreme  value  theorem 

In  chapter  1,  we  saw  that  locat¬ 
ing  maxima  and  minima  of  func¬ 
tions  may  in  general  be  fairly  dif¬ 
ficult,  because  there  are  so  many 
different  ways  in  which  a  function 
can  attain  an  extremum:  e.g.,  at 
an  endpoint,  at  a  place  where  its 
derivative  is  zero,  or  at  a  nondiffer- 
entiable  kink.  The  following  theo¬ 
rem  allows  us  to  make  a  very  gen¬ 
eral  statement  about  all  these  pos¬ 
sible  cases,  assuming  only  continu¬ 
ity. 

The  extreme  value  theorem  states 
that  if  /  is  a  continuous  real- valued 
function  on  the  real-number  inter¬ 
val  defined  by  a  <  x  <  b,  then  / 
has  maximum  and  minimum  val¬ 
ues  on  that  interval,  which  are  at¬ 
tained  at  specific  points  in  the  in¬ 
terval. 

Let’s  first  see  why  the  assumptions 
are  necessary.  If  we  weren’t  con¬ 
fined  to  a  finite  interval,  then  y  =  x 
would  be  a  counterexample,  be¬ 
cause  it’s  continuous  and  doesn’t 
have  any  maximum  or  minimum 
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value.  If  we  didn’t  assume  conti¬ 
nuity,  then  we  could  have  a  func¬ 
tion  defined  as  y  =  x  for  x  <  1, 
and  y  =  0  for  x  >  1;  this  func¬ 
tion  never  gets  bigger  than  1,  but 
it  never  attains  a  value  of  1  for  any 
specific  value  of  x. 

The  extreme  value  theorem  is 
proved,  in  a  somewhat  more  gen¬ 
eral  form,  on  page  159. 


Example  37 

>  Find  the  maximum  value  of  the  poly¬ 
nomial  P(x)  =  x3  +  x2  +  x  +  1  for 
—5  <  x  <  5. 

o  Polynomials  are  continuous,  so  the 
extreme  value  theorem  guarantees 
that  such  a  maximum  exists.  Suppose 
we  try  to  find  it  by  looking  for  a  place 
where  the  derivative  is  zero.  The 
derivative  is  3x2  +  2x  + 1 ,  and  setting  it 
equal  to  zero  gives  a  quadratic  equa¬ 
tion,  but  application  of  the  quadratic 
formula  shows  that  it  has  no  real  so¬ 
lutions.  It  appears  that  the  function 
doesn’t  have  a  maximum  anywhere 
(even  outside  the  interval  of  interest) 
that  looks  like  a  smooth  peak.  Since  it 
doesn’t  have  kinks  or  discontinuities, 
there  is  only  one  other  type  of  maxi¬ 
mum  it  could  have,  which  is  a  maxi¬ 
mum  at  one  of  its  endpoints.  Plugging 
in  the  limits,  we  find  P(— 5)  =  —104 
and  P( 5)  =  156,  so  we  conclude  that 
the  maximum  value  on  this  interval  is 
156. 
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3.2  Limits 

Historically,  the  calculus  of  in¬ 
finitesimals  as  created  by  New¬ 
ton  and  Leibniz  was  reinterpreted 
in  the  nineteenth  century  by 
Cauchy,  Bolzano,  and  Weierstrass 
in  terms  of  limits.  All  mathemati¬ 
cians  learned  both  languages,  and 
switched  back  and  forth  between 
them  effortlessly,  like  the  lady  I 
overheard  in  a  Southern  California 
supermarket  telling  her  mother, 
“Let’s  get  that  one,  con  los  nuts.” 
Those  who  had  been  trained  in  in¬ 
finitesimals  might  hear  a  statement 
using  the  language  of  limits,  but 
translate  it  mentally  into  infinites¬ 
imals;  to  them,  every  statement 
about  limits  was  really  a  state¬ 
ment  about  infinitesimals.  To  their 
younger  colleagues,  trained  using 
limits,  every  statement  about  in¬ 
finitesimals  was  really  to  be  under¬ 
stood  as  shorthand  for  a  limiting 
process.  When  Robinson  laid  the 
rigorous  foundations  for  the  hyper- 
real  number  system  in  the  1960’s,  a 
common  objection  was  that  it  was 
really  nothing  new,  because  ev¬ 
ery  statement  about  infinitesimals 
was  really  just  a  different  way  of 
expressing  a  corresponding  state¬ 
ment  about  limits;  of  course  the 
same  could  have  been  said  about 
Weierstrass’s  work  of  the  preced¬ 
ing  century!  In  reality,  all  prac¬ 
titioners  of  calculus  had  realized 
all  along  that  different  approaches 
worked  better  for  different  prob¬ 
lems;  problem  13  on  page  84  is  an 
example  of  a  result  that  is  much 


easier  to  prove  with  infinitesimals 
than  with  limits. 

The  Weierstrass  definition  of  a 
limit  is  this: 

Definition  of  the  limit 
We  say  that  t  is  the  limit  of  the 
function  f{x)  as  x  approaches  a, 
written 

lim  f{x)  =  I, 

x—>a 

if  the  following  is  true:  for  any  real 
number  e,  there  exists  another  real 
number  S  such  that  for  all  x  in  the 
interval  a— 6  <  x  <  a+<5,  the  value 
of  /  lies  within  the  range  from  i  —  t 
to  l  +  e. 


Intuitively,  the  idea  is  that  if  I  want 
you  to  make  f(x)  close  to  £,  I  just 
have  to  tell  you  how  close,  and  you 
can  tell  me  that  it  will  be  that  close 
as  long  as  x  is  within  a  certain  dis¬ 
tance  of  a. 

In  terms  of  infinitesimals,  we  have: 

Definition  of  the  limit 
We  say  that  £  is  the  limit  of  the 
function  f(x)  as  x  approaches  a, 
written 

lim  /( x)  =  I, 

x—>a 

if  the  following  is  true:  for  any  in¬ 
finitesimal  number  dx,  the  value  of 
f(a+dx)  is  finite,  and  the  standard 
part  of  f(a  +  dx)  equals  l. 
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The  two  definitions  are  equiva¬ 
lent.  As  remarked  previously,  the 
derivative  dx/  d t  can  be  defined  as 
the  limit  limAt->o(A:r/Ai),  and  if 
we  use  the  Weierstrass  definition 
of  the  limit,  this  means  that  the 
derivative  can  be  defined  entirely 
in  terms  of  the  real  number  sys¬ 
tem,  without  the  user  of  hyperreal 
numbers. 


Sometimes  a  limit  can  be  evaluated 
simply  by  plugging  in  numbers: 


>  Evaluate 


Example  38 


1 


lim  „ 

*-+o  1  +  x 


>  Plugging  in  x  =  0,  we  find  that  the 
limit  is  1 . 


In  some  examples,  plugging  in  fails 
if  we  try  to  do  it  directly,  but  can 
be  made  to  work  if  we  massage  the 
expression  into  a  different  form: 


>  Evaluate 


Example  39 


f +7 

lim  -r-Z- - . 

x— >0  1  +  8686' 


o  Plugging  in  x  =  0  fails  because  divi¬ 
sion  by  zero  is  undefined. 

Intuitively,  however,  we  expect  that  the 
limit  will  be  well  defined,  and  will  equal 
2,  because  for  very  small  values  of 
x,  the  numerator  is  dominated  by  the 
2/x  term,  and  the  denominator  by  the 
1  /x  term,  so  the  7  and  8686  terms  will 
matter  less  and  less  as  x  gets  smaller 
and  smaller. 


To  demonstrate  this  more  rigorously,  a 
trick  that  works  is  to  multiply  both  the 
top  and  the  bottom  by  x,  giving 

2  +  7x 
1  +8686x’ 

which  equals  2  when  we  plug  in  x  =  0, 
so  we  find  that  the  limit  is  zero. 

This  example  is  a  little  subtle,  because 
when  x  equals  zero,  the  function  is  not 
defined,  and  moreover  it  would  not  be 
valid  to  multiply  both  the  top  and  the 
bottom  by  x.  In  general,  it’s  not  valid 
algebra  to  multiply  both  the  top  and 
the  bottom  of  a  fraction  by  0,  because 
the  result  is  0/0,  which  is  undefined. 
But  we  didn’t  actually  multiply  both  the 
top  and  the  bottom  by  zero,  because 
we  never  let  x  equal  zero.  Both  the 
Weierstrass  definition  and  the  defini¬ 
tion  in  terms  of  infinitesimals  only  re¬ 
fer  to  the  properties  of  the  function  in  a 
region  very  close  to  the  limiting  point, 
not  at  the  limiting  point  itself. 

This  is  an  example  in  which  the  func¬ 
tion  was  not  well  defined  at  a  certain 
point,  and  yet  the  limit  of  the  function 
was  well  defined  as  we  approached 
that  point.  In  a  case  like  this,  where 
there  is  only  one  point  missing  from 
the  domain  of  the  function,  it  is  natural 
to  extend  the  definition  of  the  function 
by  filling  in  the  “gap  tooth.”  Example 
41  below  shows  that  this  kind  of  filling- 
in  procedure  is  not  always  possible. 


Example  40 

t>  Investigate  the  limiting  behavior  of 
1  /x2  as  x  approaches  0,  and  1 . 

o  At  x  =  1 ,  plugging  in  works,  and  we 
find  that  the  limit  is  1. 
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-2-1  12 


e  /  Example  40,  the  func¬ 
tion  1  /x2. 

At  x  =  0,  plugging  in  doesn’t  work, 
because  division  by  zero  is  unde¬ 
fined.  Applying  the  definition  in  terms 
of  infinitesimals  to  the  limit  as  x  ap¬ 
proaches  0,  we  need  to  find  out 
whether  1/(0  +  dx)2  is  finite  for  in¬ 
finitesimal  dx,  and  if  so,  whether  it  al¬ 
ways  has  the  same  standard  part.  But 
clearly  1  /(0  +  dx)2  =  dx-2  is  always 
infinite,  and  we  conclude  that  this  limit 
is  undefined. 


y 


f  /  Example  41 ,  the  func¬ 
tion  tan_1(1  /x). 


Example  41 

t>  Investigate  the  limiting  behavior  of 
f(x)  =  tan-1  (1  /x)  as  x  approaches  0. 


t>  Plugging  in  doesn't  work,  because 
division  by  zero  is  undefined. 

In  the  definition  of  the  limit  in  terms 
of  infinitesimals,  the  first  requirement 
is  that  f( 0  +  dx)  be  finite  for  infinites¬ 
imal  values  of  dx.  The  graph  makes 
this  look  plausible,  and  indeed  we  can 
prove  that  it  is  true  by  the  transfer  prin¬ 
ciple.  For  any  real  x  we  have  —  n/2  < 
f(x)  <  n/2,  and  by  the  transfer  prin¬ 
ciple  this  holds  for  the  hyperreals  as 
well,  and  therefore  f{ 0  +  dx)  is  finite. 

The  second  requirement  is  that  the 
standard  part  of  f(0  +  dx)  have  a 
uniquely  defined  value.  The  graph 
shows  that  we  really  have  two  cases 
to  consider,  one  on  the  right  side  of 
the  graph,  and  one  on  the  left.  In¬ 
tuitively,  we  expect  that  the  standard 
part  of  f( 0  +  dx)  will  equal  n/2  for  pos¬ 
itive  dx,  and  -n/2  for  negative,  and 
thus  the  second  part  of  the  definition 
will  not  be  satisfied.  For  a  more  formal 
proof,  we  can  use  the  transfer  princi¬ 
ple.  For  real  x  with  0  <  x  <  1 ,  for  ex¬ 
ample,  f  is  always  positive  and  greater 
than  1,  so  we  conclude  based  on  the 
transfer  principle  that  f( 0  +  dx)  >  1 
for  positive  infinitesimal  dx.  But  on 
similar  grounds  we  can  be  sure  that 
f{ 0  +  dx)  <  -1  when  dx  is  negative 
and  infinitesimal.  Thus  the  standard 
part  of  f(0  +  dx)  can  have  different  val¬ 
ues  for  different  infinitesimal  values  of 
dx,  and  we  conclude  that  the  limit  is 
undefined. 

In  examples  like  this,  we  can  define 
a  kind  of  one-sided  limit,  notated  like 
this: 


,  -i  1  n 
lim  tan  -  =  -, 

x-+o+  x  2 
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where  the  notations  x  — >  0"  and 
x  ->  0+  are  to  be  read  “as  x  ap¬ 
proaches  zero  from  below,”  and  “as  x 
approaches  zero  from  above.” 

3.3  L’Hdpital’s  rule 

Consider  the  limit 

sin  x 
Inn - . 

a;->0  x 


Plugging  in  doesn’t  work,  because 
we  get  0/0.  Division  by  zero  is 
undefined,  both  in  the  real  num¬ 
ber  system  and  in  the  hyperreals. 
A  nonzero  number  divided  by  a 
small  number  gives  a  big  number;  a 
nonzero  number  divided  by  a  very 
small  number  gives  a  very  big  num¬ 
ber;  and  a  nonzero  number  divided 
by  an  infinitesimal  number  gives 
an  infinite  number.  On  the  other 
hand,  dividing  zero  by  zero  means 
looking  for  a  solution  to  the  equa¬ 
tion  0  =  Oq,  where  q  is  the  result 
of  the  division.  But  any  q  is  a 
solution  of  this  equation,  so  even 
speaking  casually,  it’s  not  correct 
to  say  that  0/0  is  infinite;  it’s  not 
infinite,  it’s  anything  we  like. 

Since  plugging  in  zero  didn’t  work, 
let’s  try  estimating  the  limit  by 
plugging  in  a  number  for  x  that’s 
small,  but  not  zero.  On  a  calcula¬ 
tor, 


sin  0.00001 
0.00001 


0.999999999983333. 


It  looks  like  the  limit  is  1.  We  can 
confirm  our  conjecture  to  higher 
precision  using  Yacas’s  ability  to 
do  high-precision  arithmetic: 


N (Sin ( 10 ~ -20) / 10" -20 , 50) 
0.  99999999999999999 
9999999999999999999 
99998333333333 


It’s  looking  pretty  one-ish.  This  is 
the  idea  of  the  Weierstrass  defini¬ 
tion  of  a  limit:  it  seems  like  we  can 
get  an  answer  as  close  to  1  as  we 
like,  if  we’re  willing  to  make  x  as 
close  to  0  as  necessary.  The  graph 
helps  to  make  this  plausible. 


y 


g  /  The  graph  of  sin  x/x. 


The  general  idea  here  is  that  for 
small  values  of  x.  the  small-angle 
approximation  sin  a;  ss  x  obtains, 
and  as  x  gets  smaller  and  smaller, 
the  approximation  gets  better  and 
better,  so  smx/x  gets  closer  and 
closer  to  1. 


But  we  still  haven’t  proved  rigor¬ 
ously  that  the  limit  is  exactly  1. 
Let’s  try  using  the  definition  of  the 
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limit  in  terms  of  infinitesimals. 

sin  a:  l"sin(0  +  dx) 

lim - =  st  - - - 

x  0  +  da; 


where  we’ve  used  the  identity 
sin(p  +  q)  =  sinp  cos  q  +  sin  q  cos p, 
and  . . .  stands  for  terms  of  order 
da:2.  So 

, .  sin  x  r 

lim  -  =  st  1  +  —  , 

£->0  X  L  dxJ 

=  1. 

In  fact,  this  limit  is  the  same  one 
we  would  use  if  we  were  evaluat¬ 
ing  the  derivative  of  the  sine  func¬ 
tion,  applying  the  definition  of  the 
derivative  as  a  limit. 

We  can  check  our  work  using  Inf: 

:  (sin  d)/d 

l+(-0. 16667) d~2+.  .  . 

(The  ...  is  where  I’ve  snipped 
trailing  terms  from  the  output.) 

Our  example  involving  the  limit  of 
sva.x/x  is  a  special  case  of  the  fol¬ 
lowing  rule  for  calculating  limits 
involving  0/0: 

L’Hopital’s  rule  (simplest  form) 

If  u  and  v  are  functions  with 
u(a)  =  0  and  v(a)  =  0,  the  deriva¬ 
tives  i>(a )  and  v(a)  are  defined,  and 
the  derivative  v(a)  yf  0,  then 

u  u(a) 
hm  -  = 
x^-a  v  v(a) 


Proof:  Since  u(a)  =  0,  and  the 
derivative  cl u/  d.x  is  defined  at  a , 
u{a+ dx)  =  Au  is  infinitesimal,  and 
likewise  for  v.  By  the  definition  of 
the  limit,  the  limit  is  the  standard 
part  of 

u  d u  Au/  d.x 

v  ch:  Av/  Ax1 

where  by  assumption  the  numer¬ 
ator  and  denominator  are  both 
defined  (and  finite,  because  the 
derivative  is  defined  in  terms  of 
the  standard  part).  The  stan¬ 
dard  part  of  a  quotient  like  p/q 
equals  the  quotient  of  the  stan¬ 
dard  parts,  provided  that  both  p 
and  q  are  finite  (which  we’ve  estab¬ 
lished),  and  q  y^  0  (which  is  true 
by  assumption).  But  the  standard 
part  of  Au/  Ax  is  the  definition  of 
the  derivative  u ,  and  likewise  for 
Av/  Ax,  so  this  establishes  the  re¬ 
sult. 

We  will  generalize  L’Hopital’s  rule 
on  p.  65. 

By  the  way,  the  housetop  accent 
on  the  “6”  in  l’Hopital  means  that 
in  Old  French  it  used  to  be  spelled 
and  pronounced  “l’Hospital,”  but 
the  “s”  later  became  silent,  so  they 
stopped  writing  it.  So  yes,  it  is  the 
same  word  as  “hospital.” 

Example  42 

As  remarked  above,  the  example  of 
lim  x  — >  0  sin  x/x  is  in  some  sense  cir¬ 
cular,  since  the  limit  is  equivalent  to 
the  definition  of  the  derivative  of  the 
sine  function,  so  we  already  need  to 
know  the  limit  in  order  to  evaluate  the 
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limit!  As  an  example  that  isn't  circular, 
let's  evaluate 


sinx 


lim 

x^O  x  +  x-3 


The  derivative  of  the  top  is  cosx,  and 
the  derivative  of  the  bottom  is  1  +  3x2. 
Evaluating  these  at  x  =  0  gives  1  and 
1 ,  so  the  answer  is  1/1  =  1 . 


o  Evaluate 


Example  43 


x — ^0  X 


>  Taking  the  derivatives  of  the  top  and 
bottom,  we  find  e* /I,  which  equals  1 
when  evaluated  at  x  =  0. 


>  Evaluate 


Example  44 


x  —  1 

I'm  — ^ — - - 

x->i  x2  —  2x  +  1 


>  Plugging  in  x  =  1  fails,  because  both 
the  top  and  the  bottom  are  zero.  Tak¬ 
ing  the  derivatives  of  the  top  and  bot¬ 
tom,  we  find  1  /(2x  —  2),  which  blows 
up  to  infinity  when  x  =  1.  To  symbol¬ 
ize  the  fact  that  the  limit  is  undefined, 
and  undefined  because  it  blows  up  to 
infinity,  we  write 


thought  of  in  a  different  way  in 
terms  of  infinitesimals.  Suppose 
I  tell  you  I  have  two  infinitesimal 
numbers  d  and  e  in  my  pocket, 
and  I  ask  you  whether  d/e  is  fi¬ 
nite,  infinite,  or  infinitesimal.  You 
can’t  tell,  because  d  and  e  might 
not  be  infinitesimals  of  the  same 
order  of  magnitude.  For  instance, 
if  e  =  37 d,  then  d/e  =  1/37  is  fi¬ 
nite;  but  if  e  =  d2,  then  d/e  is  in¬ 
finite;  and  if  d  =  e2,  then  d/e  is 
infinitesimal.  Acting  this  out  with 
numbers  that  are  small  but  not  in¬ 
finitesimal, 


.001  1 


.037 

.001 

000001 

000001 

.001 


37 

1000 

.001. 


On  the  other  hand,  suppose  I  tell 
you  I  have  an  infinitesimal  num¬ 
ber  d  and  a  finite  number  x,  and 
I  ask  you  to  speculate  about  d/x. 
You  know  for  sure  that  it’s  going  to 
be  infinitesimal.  Likewise,  you  can 
be  sure  that  x/d  is  infinite.  These 
aren’t  indeterminate  forms. 


3.4  Another 

perspective  on 

indeterminate 

forms 

An  expression  like  0/0,  called 
an  indeterminate  form,  can  be 


We  can  do  something  similar  with 
infinite  numbers.  If  H  and  K  are 
both  infinite,  then  H  —  K  is  inde¬ 
terminate.  It  could  be  infinite,  for 
example,  if  H  was  positive  infinite 
and  K  =  H/2.  On  the  other  hand, 
it  could  be  finite  if  H  =  K  +  1. 
Acting  this  out  with  big  but  finite 
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numbers, 


3.5  Limits  at  infinity 


1000  -  500  =  500 
1001  -  1000  =  1. 


Example  45 

>  If  H  is  a  positive  infinite  number, 
is  CH  +  1  —  \/H  —  1  finite,  infinite,  in¬ 
finitesimal,  or  indeterminate? 

>  Trying  it  with  a  finite,  big  number,  we 
have 

^1 000001 -V999999 
=  1.00000000020373  x  10~3, 

which  is  clearly  a  wannabe  infinites¬ 
imal.  We  can  verify  the  result  using 
Inf: 


:  H=l/d 
d~-l 

:  sqrt (H+l) -sqrt (H-l) 
d~ 1/2+0. 125d~5/2+.  .  . 


For  convenience,  the  first  line  of  input 
defines  an  infinite  number  H  in  terms 
of  the  calculator’s  built-in  infinitesimal 
d.  The  result  has  only  positive  powers 
of  d,  so  it's  clearly  infinitesimal. 

More  rigorously,  we  can  rewrite 
the  expression  as  1  +  1  /H  - 

sj  1  -  1  /H).  Since  the  derivative  of 
the  square  root  function  s/x  evaluated 
at  x  =  1  is  1/2,  we  can  approximate 
this  as 


sfH 


=  SH 


1 

~H 


1 


which  is  infinitesimal. 


The  definition  of  the  limit  in  terms 
of  infinitesimals  extends  immedi¬ 
ately  to  limiting  processes  where 
x  gets  bigger  and  bigger,  rather 
than  closer  and  closer  to  some  fi¬ 
nite  value.  For  example,  the  func¬ 
tion  3+1/a;  clearly  gets  closer 
and  closer  to  3  as  r  gets  bigger 
and  bigger.  If  a  is  an  infinite 
number,  then  the  definition  says 
that  evaluating  this  expression  at 
a  +  dx,  where  dx  is  infinitesimal, 
gives  a  result  whose  standard  part 
is  3.  It  doesn’t  matter  that  a 
happens  to  be  infinite,  the  defini¬ 
tion  still  works.  We  also  note  that 
in  this  example,  it  doesn’t  matter 
what  infinite  number  a  is;  the  limit 
equals  3  for  any  infinite  a.  We  can 
write  this  fact  as 

lim  (  3  H —  )  =3, 

x^oo  \  X  ) 

where  the  symbol  oo  is  to  be  in¬ 
terpreted  as  “nyeah  nyeah,  I  don’t 
even  care  what  infinite  number  you 
put  in  here,  I  claim  it  will  work 
out  to  3  no  matter  what.”  The 
symbol  oo  is  not  to  be  interpreted 
as  standing  for  any  specific  infinite 
number.  That  would  be  the  type 
of  fallacy  that  lay  behind  the  bo¬ 
gus  proof  on  page  30  that  1  =  1/2, 
which  assumed  that  all  infinities 
had  to  be  the  same  size. 

A  somewhat  different  example  is 
the  arctangent  function.  The  arc¬ 
tangent  of  1000  equals  approxi¬ 
mately  1.5698,  and  inputting  big¬ 
ger  and  bigger  numbers  gives  an- 
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swers  that  appear  to  get  closer 
and  closer  to  tt/2  «  1.5707.  But 
the  arctangent  of  -1000  is  approxi¬ 
mately  —  1.5698,  i.e.,  very  close  to 
— tt/2.  From  these  numerical  ob¬ 
servations,  we  conjecture  that 

lim  tan-1  x 

x—>a 

equals  n/2  for  positive  infinite  a, 
but  —tt/2  for  negative  infinite  a. 
It  would  not  be  correct  to  write 

1 .  _ 1  7T  i>  , 

lim  tan  x  =  —  [wrong], 

x—>oo  2 

because  it  does  matter  what  infi¬ 
nite  number  we  pick.  Instead  we 
write 

..  _i  7 r 

inn  tan  x  =  — 

x->-\-oo  2 

1-  -1  n 

lim  tan  x  =  —  — . 

x—>  —  oo  2 

Some  expressions  don’t  have  this 
kind  of  limit  at  all.  For  exam¬ 
ple,  if  you  take  the  sines  of  big 
numbers  like  a  thousand,  a  million, 
etc.,  on  your  calculator,  the  re¬ 
sults  are  essentially  random  num¬ 
bers  lying  between  —1  and  1.  They 
don’t  settle  down  to  any  particular 
value,  because  the  sine  function  os¬ 
cillates  back  and  forth  forever.  To 
prove  formally  that  linr^+oo  sin  a: 
is  undefined,  consider  that  the  sine 
function,  defined  on  the  real  num¬ 
bers,  has  the  property  that  you 
can  always  change  its  result  by  at 
least  0.1  if  you  add  either  1.5  or 
—  1.5  to  its  input.  For  example, 
sin(.8)  «  0.717,  and  sin(.8  — 1.5)  « 
—0.644.  Applying  the  transfer 


principle  to  this  statement,  we  find 
that  the  same  is  true  on  the  hyper- 
reals.  Therefore  there  cannot  be 
any  value  t  that  differs  infinitesi¬ 
mally  from  sin  a  for  all  positive  in¬ 
finite  values  of  a. 

Often  we’re  interested  in  finding 
the  limit  as  x  approaches  infinity 
of  an  expression  that  is  written  as 
an  indeterminate  form  like  H/K1 
where  both  H  and  K  are  infinite. 

Example  46 

>  Evaluate  the  limit 

lim  2x  +  7 
x-s-oo  x  +  8686 

>  Intuitively,  if  x  gets  large  enough  the 
constant  terms  will  be  negligible,  and 
the  top  and  bottom  will  be  dominated 
by  the  2x  and  x  terms,  respectively, 
giving  an  answer  that  approaches  2. 

One  way  to  verify  this  is  to  divide  both 
the  top  and  the  bottom  by  x,  giving 


-|  +  8686  ' 


If  x  is  infinite,  then  the  standard  part 
of  the  top  is  2,  the  standard  part  of  the 
bottom  is  1 ,  and  the  standard  part  of 
the  whole  thing  is  therefore  2. 

Another  approach  is  to  use  I'Hopital’s 
rule.  The  derivative  of  the  top  is  2,  and 
the  derivative  of  the  bottom  is  1 ,  so  the 
limit  is  2/1=2. 

3.6  Generalizations 
of  I’Hopital’s  rule 

Mathematical  theorems  are  some¬ 
times  like  cars.  I  own  a  Honda  Fit 
that  is  about  as  bare-bones  as  you 
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can  get  these  days,  but  persuad¬ 
ing  a  dealer  to  sell  me  that  car 
was  like  pulling  teeth.  The  sales¬ 
man  was  absolutely  certain  that 
any  sane  customer  would  want  to 
pay  an  extra  $1,800  for  such  cru¬ 
cial  amenities  as  floor  mats  and  a 
chrome  tailpipe.  L’Hopital’s  rule 
in  its  most  general  form  is  a  much 
fancier  piece  of  machinery  than 
the  stripped  down  model  described 
on  p.  61.  The  price  you  pay  for 
the  deluxe  model  is  that  the  proof 
becomes  much  more  complicated 
than  the  one-liner  that  sufficed  for 
the  simple  version. 


The  indeterminate  form  oo/oo 


Consider  an  example  like  this: 


lim  |  +  l . 

x->o  1  +  2/x 


This  is  an  indeterminate  form  like 
oo/oo  rather  than  the  0/0  form 
for  which  we’ve  already  proved 
l’Hopital’s  rule.  As  proved  on 
p.  153,  l’Hopital’s  rule  applies  to 
examples  like  this  as  well. 


>  Evaluate 


Example  48 


lim 


1  + 1  /x 
o  1  +  2/x' 


Multiple  applications  of  the  rule 


In  the  following  example,  we  have 
to  use  l’Hopital’s  rule  twice  before 
we  get  an  answer. 


>  Evaluate 


Example  47 


lim 

X— >71 


1  +  cosx 

(x  —  7t)2 


>  Both  the  numerator  and  the  de¬ 
nominator  go  to  infinity.  Differenti¬ 
ation  of  the  top  and  bottom  gives 
(-x~2)/(-2x~2)  =  1/2.  We  can  see 
that  the  reason  the  rule  worked  was 
that  (1)  the  constant  terms  were  irrel¬ 
evant  because  they  become  negligible 
as  the  1  /x  terms  blow  up;  and  (2)  dif¬ 
ferentiating  the  blowing-up  1  /x  terms 
makes  them  into  the  same  x~2  on  top 
and  bottom,  which  cancel. 


>  Applying  I’Hopital's  rule  gives 

—  sinx 
2(x  -  7t)  ’ 

which  still  produces  0/0  when  we  plug 
in  x  =  7t.  Going  again,  we  get 

-cosx  _  1 
2  “  2' 


The  reason  that  this  always  works 
is  outlined  on  p.  152. 


Note  that  we  could  also  have  gotten 
this  result  without  I’Hopital’s  rule,  sim¬ 
ply  by  multiplying  both  the  top  and  the 
bottom  of  the  original  expression  by  x 
in  order  to  rewrite  it  as  (x+  1)/(x  +  2). 

Limits  at  infinity 

It  is  straightforward  to  prove  a 
variant  of  l’Hopital’s  rule  that  al¬ 
lows  us  to  do  limits  at  infinity.  The 
general  proof  is  left  as  an  exercise 
(problem  8,  p.  68).  The  result  is 
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that  l’Hopital’s  rule  is  equally  valid 
when  the  limit  is  at  ±oo  rather 
than  at  some  real  number  a. 


>  Evaluate 


Example  49 


x->oo  x  +  8686 

>  We  could  use  a  change  of  variable 
to  make  this  into  example  39  on  p.  59, 
which  was  solved  using  an  ad  hoc  and 
multiple-step  procedure.  But  having 
established  the  more  general  form  of 
I’Hopital’s  rule,  we  can  do  it  in  one 
step.  Differentiation  of  the  top  and  bot¬ 
tom  produces 


68 


CHAPTER  3.  LIMITS  AND  CONTINUITY 


Problems 

1  (a)  Prove,  using  the  Weier- 
strass  definition  of  the  limit, 
that  if  linij,-^  f(x)  =  F  and 
linx^a  g(x)  =  G  both  exist,  them 
lim X^a[f(x)  +  g(x)\  =  F  +  G,  i.e., 
that  the  limit  of  a  sum  is  the  sum 
of  the  limits,  (b)  Prove  the  same 
thing  using  the  definition  of  the 
limit  in  terms  of  infinitesimals. 

>  Solution,  p.  185 

2  Sketch  the  graph  of  the  func¬ 
tion  e-1/x,  and  evaluate  the  follow¬ 
ing  four  limits: 

lim  e~1/x 

x— >-0+ 

lim  e~l/x 

tc— >-0- 

lim  e~x!x 

x->-\-  oo 

lim  e~1/x 


exactly,  and  check  your  result  by 
numerical  approximation. 

>  Solution,  p.  186 

5  Amy  is  asked  to  evaluate 

lim  — . 

®-t0  X 

She  applies  l’Hopital’s  rule,  differ¬ 
entiating  top  and  bottom  to  find 
l/ex,  which  equals  1  when  she 
plugs  in  r  =  0.  What  is  wrong 
with  her  reasoning? 

>  Solution,  p.  187 

6  Evaluate 


u— >o  e“  +  e  u  —  2 

exactly,  and  check  your  result  by 
numerical  approximation. 

>  Solution,  p.  187 

7  Evaluate 


t>  Solution,  p.  185 


3 


Verify  the  following  limits. 


lim 

S-J-l 


lim 

0->O 


1  —  cos  9 


92 


1 

2 


lim 

x—>oo 


lim 

n—to o 


lim 

x—>oo 


5x2  —  2x 

-  =  00 

X 

n(n  +  1) 

(?r  +  2)  (n  +  3) 
ax2  +  bx  +  c  a 
dx2  +  ex  +  f  d 


t>  Solution,  p.  185  [Granville,  1911] 
4  Evaluate 

x  cos  x 

lim 


>o  1  -  22 


suit 
Inn - 

t  t7T  t,  —  7T 


exactly,  and  check  your  result  by 
numerical  approximation. 

>  Solution,  p.  187 

8  Prove  a  form  of  l’Hopital’s 
rule  stating  that 


lim 

x—too 


f(x) 

g{x) 


is  equal  to  the  limit  of  f'/g'  at  in¬ 
finity.  Hint:  change  to  some  new 
variable  u  such  that  x  — >  oo  corre¬ 
sponds  to  u  — t-  0. 

>  Solution,  p.  187 

9  Prove  that  the  linear  func¬ 
tion  y  =  ax  +  b,  where  a  and  b  are 


PROBLEMS 
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real,  is  continuous,  first  using  the 
definition  of  continuity  in  terms  of 
infinitesimals,  and  then  using  the 
definition  in  terms  of  the  Weier- 
strass  limit.  >  Solution,  p.  187 


70 


CHAPTER  3.  LIMITS  AND  CONTINUITY 


4  Integration 


4.1  Definite  and 
indefinite 
integrals 

Because  any  formula  can  be  differ¬ 
entiated  symbolically  to  find  an¬ 
other  formula,  the  main  motiva¬ 
tion  for  doing  derivatives  numeri¬ 
cally  would  be  if  the  function  to 
be  differentiated  wasn’t  known  in 
symbolic  form.  A  typical  exam¬ 
ple  might  be  a  two-person  network 
computer  game,  in  which  player 
A’s  computer  needs  to  figure  out 
player  B’s  velocity  based  on  knowl¬ 
edge  of  how  her  position  changes 
over  time.  But  in  most  cases,  it’s 
numerical  integration  that’s  inter¬ 
esting,  not  numerical  differentia¬ 
tion. 

As  a  warm-up,  let’s  see  how  to  do 
a  running  sum  of  a  discrete  func¬ 
tion  using  Yacas.  The  following 
program  computes  the  sum  1  + 
2  +  . . .  +  100  discussed  to  on  page 
7.  Now  that  we’re  writing  real 
computer  programs  with  Yacas,  it 
would  be  a  good  idea  to  enter  each 
program  into  a  file  before  trying  to 
run  it.  In  fact,  some  of  these  exam¬ 
ples  won’t  run  properly  if  you  just 
start  up  Yacas  and  type  them  in 
one  line  at  a  time.  If  you’re  using 
Adobe  Reader  to  read  this  book, 
you  can  do  Tools>Basic>Select, 
select  the  program,  copy  it  into  a 
file,  and  then  edit  out  the  line  num¬ 


bers. 

Example  50 

1  n  :=  1; 

2  sum  : =  0 ; 

3  While  (n<=100)  [ 

4  sum  :=  sum+n; 

5  n  : =  n+ 1 ; 

6  I; 

7  Echo (sum) ; 

The  semicolons  are  to  separate  one 
instruction  from  the  next,  and  they 
become  necessary  now  that  we’re 
doing  real  programming.  Line  1 
of  this  program  defines  the  vari¬ 
able  n,  which  will  take  on  all  the 
values  from  1  to  100.  Line  2  says 
that  we  haven’t  added  anything  up 
yet,  so  our  running  sum  is  zero  so 
far.  Line  3  says  to  keep  on  re¬ 
peating  the  instructions  inside  the 
square  brackets  until  n  goes  past 
100.  Line  4  updates  the  running 
sum,  and  line  5  updates  the  value 
of  n.  If  you’ve  never  done  any  pro¬ 
gramming  before,  a  statement  like 
n:=n+l  might  seem  like  nonsense 
how  can  a  number  equal  itself 
plus  one?  But  that’s  why  we  use 
the  :=  symbol;  it  says  that  we’re 
redefining  n,  not  stating  an  equa¬ 
tion.  If  n  was  previously  37,  then 
after  this  statement  is  executed,  n 
will  be  redefined  as  38.  To  run  the 
program  on  a  Linux  computer,  do 
this  (assuming  you  saved  the  pro¬ 
gram  in  a  file  named  sum. yacas): 

1  yacas  -pc  sum. yacas 
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5050 

Here  the  %  symbol  is  the  com¬ 
puter’s  prompt.  The  result  is 
5,050,  as  expected.  One  way  of 
stating  this  result  is 

100 

Y  n  =  5050. 

n—  1 

The  capital  Greek  letter  E,  sigma, 
is  used  because  it  makes  the  “s” 
sound,  and  that’s  the  first  sound  in 
the  word  “sum.”  The  n  =  1  below 
the  sigma  says  the  sum  starts  at  1, 
and  the  100  on  top  says  it  ends  at 
100.  The  n  is  what’s  known  as  a 
dummy  variable:  it  has  no  mean¬ 
ing  outside  the  context  of  the  sum. 
Figure  a  shows  the  graphical  inter¬ 
pretation  of  the  sum:  we’re  adding 
up  the  areas  of  a  series  of  rectan¬ 
gular  strips.  (For  clarity,  the  figure 
only  shows  the  sum  going  up  to  7, 
rather  than  100.) 


a  /  Graphical  interpreta¬ 
tion  of  the  sum  1 +2  +  . .  .+ 

7. 


Now  how  about  an  integral?  Fig¬ 
ure  b  shows  the  graphical  inter¬ 


pretation  of  what  we’re  trying  to 
do:  find  the  area  of  the  shaded 
triangle.  This  is  an  example  we 
know  how  to  do  symbolically,  so 
we  can  do  it  numerically  as  well, 
and  check  the  answers  against  each 
other.  Symbolically,  the  area  is 
given  by  the  integral.  To  inte¬ 
grate  the  function  x(t )  =  t,  we 
know  we  need  some  function  with 
a  t2  in  it,  since  we  want  something 
whose  derivative  is  t,  and  differen¬ 
tiation  reduces  the  power  by  one. 
The  derivative  of  t2  would  be  2 1 
rather  than  t,  so  what  we  want  is 
x{t)  =  t2 /2.  Let’s  compute  the 
area  of  the  triangle  that  stretches 
along  the  t  axis  from  0  to  100: 
x(100)  =  1002/2  =  5000. 


b  /  Graphical  interpreta¬ 
tion  of  the  integral  of  the 
function  x(t)  =  t. 


Figure  c  shows  how  to  accomplish 
the  same  thing  numerically.  We 
break  up  the  area  into  a  whole 
bunch  of  very  skinny  rectangles. 
Ideally,  we’d  like  to  make  the  width 
of  each  rectangle  be  an  infinitesi¬ 
mal  number  d;r,  so  that  we’d  be 
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adding  up  an  infinite  number  of  in¬ 
finitesimal  areas.  In  reality,  a  com¬ 
puter  can’t  do  that,  so  we  divide  up 
the  interval  from  t  =  0  to  t  =  100 
into  H  rectangles,  each  with  fi¬ 
nite  width  d t  =  100/i7.  Instead 
of  making  H  infinite,  we  make  it 
the  largest  number  we  can  without 
making  the  computer  take  too  long 
to  add  up  the  areas  of  the  rectan¬ 
gles. 


c  /  Approximating  the  in¬ 
tegral  numerically. 


Example  51 

1  tmax  :=  100; 

2  H  :=  1000; 

3  dt  : =  tmax/H ; 

4  sum  : =  0 ; 

5  t  :=  0; 

6  While  (t<=tmax)  [ 

7  sum  :=  N(sum+t*dt) ; 

8  t  : =  N (t+dt) ; 

9  1; 

10  Echo  (sum)  ; 

In  example  51,  we  split  the  in¬ 
terval  from  t  =  0  to  100  into 
H  =  1000  small  intervals,  each 
with  width  dt  =  0.1.  The  result  is 
5,005,  which  agrees  with  the  sym¬ 


bolic  result  to  three  digits  of  preci¬ 
sion.  Changing  H  to  10,000  gives 
5,000.5,  which  is  one  more  digit. 
Clearly  as  we  make  the  number 
of  rectangles  greater  and  greater, 
we’re  converging  to  the  correct  re¬ 
sult  of  5,000. 

In  the  Leibniz  notation,  the  thing 
we’ve  just  calculated,  by  two  differ¬ 
ent  techniques,  is  written  like  this: 

,•100 

/  tdt  =  5,  000 

J  o 

It  looks  a  lot  like  the  E  notation, 
with  the  E  replaces  by  a  flattened- 
out  letter  “S.”  The  t  is  a  dummy 
variable.  What  I’ve  been  casually 
referring  to  as  an  integral  is  re¬ 
ally  two  different  but  closely  re¬ 
lated  things,  known  as  the  definite 
integral  and  the  indefinite  integral. 

Definition  of  the  indefinite  integral 
If  x  is  a  function,  then  a  function 
x  is  an  indefinite  integral  of  x  if,  as 
implied  by  the  notation,  da: /  dt  = 

x. 

Interpretation:  Doing  an  indefi¬ 
nite  integral  means  doing  the  op¬ 
posite  of  differentiation.  All  the 
possible  indefinite  integrals  are  the 
same  function  except  for  an  addi¬ 
tive  constant. 

Example  52 

t>  Find  the  indefinite  integral  of  the 
function  x(t)  =  t. 

t>  Any  function  of  the  form 
x(t)  =  t2/ 2  +  c, 
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where  c  is  a  constant,  is  an  indefi¬ 
nite  integral  of  this  function,  since  its 
derivative  is  t. 


x(b )  —  x(a). 


Definition  of  the  definite  integral 
If  x  is  a  function,  then  the  definite 
integral  of  x  from  a  to  b  is  defined 
as 


H 

=  lim  V  x  (a  +  iAt )  At, 

i—0 

where  At  =  (b  —  a)/H. 

Interpretation:  What  we’re  calcu¬ 
lating  is  the  area  under  the  graph 
of  x,  from  a  to  b.  (If  the  graph 
dips  below  the  t  axis,  we  interpret 
the  area  between  it  and  the  axis  as 
a  negative  area.)  The  thing  inside 
the  limit  is  a  calculation  like  the 
one  done  in  example  51,  but  gen¬ 
eralized  to  a  0.  If  H  was  infinite, 
then  At  would  be  an  infinitesimal 
number  dt. 


4.2  The  fundamental 
theorem  of 
calculus 


The  fundamental  theorem  is 
proved  on  page  154.  The  idea  it 
expresses  is  that  integration  and 
differentiation  are  inverse  opera¬ 
tions.  That  is,  integration  undoes 
differentiation,  and  differentiation 
undoes  integration. 

Example  53 

>  Interpret  the  definite  integral 


graphically;  then  evaluate  it  both  sym¬ 
bolically  and  numerically,  and  check 
that  the  two  results  are  consistent. 


x 


The  fundamental  theorem  of  calcu¬ 
lus 

Let  x  be  an  indefinite  integral  of 
x ,  and  let  i  be  a  continuous  func¬ 
tion  (one  whose  graph  is  a  single 
connected  curve).  Then 


d  /  The  definite  integral 
/f(1/f)df. 


>  Figure  d  shows  the  graphical  inter¬ 
pretation.  The  numerical  calculation 
requires  a  trivial  variation  on  the  pro¬ 
gram  from  example  51 : 


4.3.  PROPERTIES  OF  THE  INTEGRAL 
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a  :  =  1 ; 
b  :=  2; 

H  :=  1000; 
dt  :=  (b-a)/H; 
sum  : =  0 ; 
t  :  =  a; 

While  (t<=b)  [ 

sum  :=  N(sum+(l/t)*dt) ; 
t  : =  N (t+dt) ; 

]; 

Echo (sum) ; 


and 

d ,  ,,  if 
di(c/)  =  cC 

But  since  the  indefinite  integral  is 
just  the  operation  of  undoing  a 
derivative,  the  same  kind  of  rules 
must  hold  true  for  indefinite  inte¬ 
grals  as  well: 

J(f  +  g)<lx  =  j  fdx  +  j  g  dx 


The  result  is  0.693897243,  and 
increasing  H  to  10,000  gives 
0.6932221811,  so  we  can  be 
fairly  confident  that  the  result  equals 
0.693,  to  3  decimal  places. 

Symbolically,  the  indefinite  integral  is 
x  =  In  t.  Using  the  fundamental  the¬ 
orem  of  calculus,  the  area  is  In  2  — 
In  1  «  0.693147180559945. 

Judging  from  the  graph,  it  looks  plau¬ 
sible  that  the  shaded  area  is  about 

0.7. 


and 

J  (■ cf )  d x  =  cJ  f  dx. 

And  since  a  definite  integral  can  be 
found  by  plugging  in  the  upper  and 
lower  limits  of  integration  into  the 
indefinite  integral,  the  same  prop¬ 
erties  must  be  true  of  definite  inte¬ 
grals  as  well. 

Example  54 

>  Evaluate  the  indefinite  integral 


This  is  an  interesting  example,  be¬ 
cause  the  natural  log  blows  up  to  neg¬ 
ative  infinity  as  t  approaches  0,  so  it’s 
not  possible  to  add  a  constant  onto 
the  indefinite  integral  and  force  it  to  be 
equal  to  0  at  t  =  0.  Nevertheless,  the 
fundamental  theorem  of  calculus  still 
works. 


4.3  Properties  of  the 
integral 

Let  /  and  g  be  two  functions  of  x, 
and  let  c  be  a  constant.  We  already 
know  that  for  derivatives, 


_d 

dx 


(/  +  <?) 


d/  +  d5 

dx  dx 


(x  +  2sinx)  dx. 


>  Using  the  additive  property,  the  inte¬ 
gral  becomes 


xdx  - 


2sinxdx. 


Then  the  property  of  scaling  by  a  con¬ 
stant  lets  us  change  this  to 


xdx +  2 


sin  xdx. 


We  need  a  function  whose  derivative 
is  x,  which  would  be  x2/2,  and  one 
whose  derivative  is  sinx,  which  must 
be  -  cos  x,  so  the  result  is 
1  ? 

-x  -2cosx  +  c. 
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4.4  Applications 

Averages 

In  the  story  of  Gauss’s  problem  of 
adding  up  the  numbers  from  1  to 
100,  one  interpretation  of  the  re¬ 
sult,  5,050,  is  that  the  average  of 
all  the  numbers  from  1  to  100  is 


50.5.  This  is  the  ordinary  defini¬ 
tion  of  an  average:  add  up  all  the 
things  you  have,  and  divide  by  the 
number  of  things.  (The  result  in 
this  example  makes  sense,  because 
half  the  numbers  are  from  1  to  50, 
and  half  are  from  51  to  100,  so  the 
average  is  half-way  between  50  and 
51.) 

Similarly,  a  definite  integral  can 
also  be  thought  of  as  a  kind  of  aver¬ 
age.  In  general,  if  y  is  a  function  of 
x ,  then  the  average,  or  mean,  value 
of  y  on  the  interval  from  x  =  a  to 
b  can  be  defined  as 

y  =  ^~  [bydx. 

b-  a  Ja 

In  the  continuous  case,  dividing  by 
b  —  a  accomplishes  the  same  thing 
as  dividing  by  the  number  of  things 
in  the  discrete  case. 

Example  55 

>  Show  that  the  definition  of  the  aver¬ 
age  makes  sense  in  the  case  where 
the  function  is  a  constant. 

>  If  y  is  a  constant,  then  we  can  take 


Example  56 

>  Find  the  average  value  of  the  func¬ 
tion  y  =  x2  for  values  of  x  ranging  from 
0  to  1 . 


'-t hf.** 


1 

3 


The  mean  value  theorem 
If  the  continuous  function  y(x)  has 
the  average  value  y  on  the  inter¬ 
val  from  x  =  a  to  b,  then  y  at¬ 
tains  its  average  value  at  least  once 
in  that  interval,  i.e.,  there  exists  £ 
with  a  <  £  <  b  such  that  y(£)  =  y. 


The  mean  value  theorem  is  proved 
on  page  161.  The  special  case  in 
which  y  =  0  is  known  as  Rolle’s 
theorem. 

Example  57 

>  Verify  the  mean  value  theorem  for 
y  =  x2  on  the  interval  from  0  to  1 . 


4.4.  APPLICATIONS 
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>  The  mean  value  is  1/3,  as  shown  in 
example  56.  This  value  is  achieved 
at  x  =  yT/3  =  1  /-s/3,  which  lies  be¬ 
tween  0  and  1 . 


- 


a 

0 


Work 


The  reason  1/1/  grows  like  a2,  not  just 
like  a,  is  that  as  the  spring  is  com¬ 
pressed  more,  more  and  more  effort 
is  required  in  order  to  compress  it. 


In  physics,  work  is  a  measure  of 
the  amount  of  energy  transferred 
by  a  force;  for  example,  if  a  horse 
sets  a  wagon  in  motion,  the  horse’s 
force  on  the  wagon  is  putting  some 
energy  of  motion  into  the  wagon. 
When  a  force  F  acts  on  an  ob¬ 
ject  that  moves  in  the  direction  of 
the  force  by  an  infinitesimal  dis¬ 
tance  dx,  the  infinitesimal  work 
done  is  d W  =  F  dx.  Integrating 
both  sides,  we  have  W  =  F  dx, 
where  the  force  may  depend  on  x, 
and  a  and  b  represent  the  initial 
and  final  positions  of  the  object. 

Example  58 

>  A  spring  compressed  by  an  amount 
x  relative  to  its  relaxed  length  provides 
a  force  F  =  kx.  Find  the  amount  of 
work  that  must  be  done  in  order  to 
compress  the  spring  from  x  =  0  to 
x  =  a.  (This  is  the  amount  of  energy 
stored  in  the  spring,  and  that  energy 
will  later  be  released  into  the  toy  bul¬ 
let.) 

> 

W=  [  F  dx 
Jo 

=  f3 kx dx 

Jo 


Probability 

Mathematically,  the  probability 
that  something  will  happen  can  be 
specified  with  a  number  ranging 
from  0  to  1,  with  0  representing  im¬ 
possibility  and  1  representing  cer¬ 
tainty.  If  you  flip  a  coin,  heads  and 
tails  both  have  probabilities  of  1/2. 
The  sum  of  the  probabilities  of  all 
the  possible  outcomes  has  to  have 
probability  1.  This  is  called  nor¬ 
malization. 


e  /  Normalization:  the 

probability  of  picking 
land  plus  the  probability 
of  picking  water  adds  up 
to  1. 

So  far  we’ve  discussed  random  pro¬ 
cesses  having  only  two  possible 
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outcomes:  yes  or  no,  win  or  lose, 
on  or  off.  More  generally,  a  ran¬ 
dom  process  could  have  a  result 
that  is  a  number.  Some  processes 
yield  integers,  as  when  you  roll  a 
die  and  get  a  result  from  one  to 
six,  but  some  are  not  restricted  to 
whole  numbers,  e.g.,  the  height  of 
a  human  being,  or  the  amount  of 
time  that  a  uranium-238  atom  will 
exist  before  undergoing  radioactive 
decay.  The  key  to  handling  these 
continuous  random  variables  is  the 
concept  of  the  area  under  a  curve, 
i.e.,  an  integral. 


result 

f  /  Probability  distribution  for  the  result 
of  rolling  a  single  die. 

Consider  a  throw  of  a  die.  If  the  die 
is  “honest,”  then  we  expect  all  six 
values  to  be  equally  likely.  Since  all 
six  probabilities  must  add  up  to  1, 
then  probability  of  any  particular 
value  coming  up  must  be  1/6.  We 
can  summarize  this  in  a  graph,  f. 
Areas  under  the  curve  can  be  inter¬ 
preted  as  total  probabilities.  For 
instance,  the  area  under  the  curve 
from  1  to  3  is  1/6+1/6+1/6  =  1/2, 
so  the  probability  of  getting  a  re¬ 


sult  from  1  to  3  is  1/2.  The  func¬ 
tion  shown  on  the  graph  is  called 
the  probability  distribution. 


♦ 


g  /  Rolling  two  dice  and  adding  them 
up. 


Figure  g  shows  the  probabilities  of 
various  results  obtained  by  rolling 
two  dice  and  adding  them  to¬ 
gether,  as  in  the  game  of  craps. 
The  probabilities  are  not  all  the 
same.  There  is  a  small  probability 
of  getting  a  two,  for  example,  be¬ 
cause  there  is  only  one  way  to  do  it, 
by  rolling  a  one  and  then  another 
one.  The  probability  of  rolling  a 
seven  is  high  because  there  are  six 
different  ways  to  do  it:  1+6,  2+5, 
etc. 

If  the  number  of  possible  outcomes 
is  large  but  finite,  for  example  the 
number  of  hairs  on  a  dog,  the 
graph  would  start  to  look  like  a 
smooth  curve  rather  than  a  ziggu- 
rat. 
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What  about  probability  distribu¬ 
tions  for  random  numbers  that  are 
not  integers?  We  can  no  longer 
make  a  graph  with  probability  on 
the  y  axis,  because  the  probabil¬ 
ity  of  getting  a  given  exact  num¬ 
ber  is  typically  zero.  For  instance, 
there  is  zero  probability  that  a  per¬ 
son  will  be  exactly  200  cm  tall, 
since  there  are  infinitely  many  pos¬ 
sible  results  that  are  close  to  200 
but  not  exactly  two,  for  exam¬ 
ple  199.99999999687687658766.  It 
doesn’t  usually  make  sense,  there¬ 
fore,  to  talk  about  the  probability 
of  a  single  numerical  result,  but  it 
does  make  sense  to  talk  about  the 
probability  of  a  certain  range  of  re¬ 
sults.  For  instance,  the  probability 
that  a  randomly  chosen  person  will 
be  more  than  170  cm  and  less  than 
200  cm  in  height  is  a  perfectly  rea¬ 
sonable  thing  to  discuss.  We  can 
still  summarize  the  probability  in¬ 
formation  on  a  graph,  and  we  can 
still  interpret  areas  under  the  curve 
as  probabilities. 


120  140  160  180  200 
o  height  (cm) 

o. 

h  /  A  probability  distribution  for  human 
height. 


But  the  y  axis  can  no  longer  be  a 
unitless  probability  scale.  In  the 
example  of  human  height,  we  want 
the  x  axis  to  have  units  of  meters, 
and  we  want  areas  under  the  curve 
to  be  unitless  probabilities.  The 
area  of  a  single  square  on  the  graph 
paper  is  then 

(unitless  area  of  a  square) 

=  (width  of  square 
with  distance  units) 
x  (height  of  square). 

If  the  units  are  to  cancel  out,  then 
the  height  of  the  square  must  ev¬ 
idently  be  a  quantity  with  units 
of  inverse  centimeters.  In  other 
words,  the  y  axis  of  the  graph  is 
to  be  interpreted  as  probability  per 
unit  height,  not  probability. 

Another  way  of  looking  at  it  is  that 
the  y  axis  on  the  graph  gives  a 
derivative,  d P/  dax  the  infinites¬ 
imally  small  probability  that  x 
will  lie  in  the  infinitesimally  small 
range  covered  by  dax 

Example  59 

>  A  computer  language  will  typically 
have  a  built-in  subroutine  that  pro¬ 
duces  a  fairly  random  number  that 
is  equally  likely  to  take  on  any  value 
in  the  range  from  0  to  1.  If  you 
take  the  absolute  value  of  the  differ¬ 
ence  between  two  such  numbers,  the 
probability  distribution  is  of  the  form 
d P/ dx  =  k(  1  —  x).  Find  the  value  of 
the  constant  k  that  is  required  by  nor¬ 
malization. 
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> 

1  =  f  k(  1  -  x)  dx 

Jo 

=  kx-  -kx2 
2  o 
=  k  —  k/2 
k  =  2 


Self-Check 

Compare  the  number  of  people  with 
heights  in  the  range  of  130-135  cm  to 
the  number  in  the  range  135-140.  > 

Answer,  p.  1 65 


i  /  The  average  can  be  interpreted  as 
the  balance  point  of  the  probability  dis¬ 
tribution. 

When  one  random  variable  is  re¬ 
lated  to  another  in  some  mathe¬ 
matical  way,  the  chain  rule  can  be 
used  to  relate  their  probability  dis¬ 
tributions. 


x 


j  /  Example  60. 


Example  60 

>  A  laser  is  placed  one  meter  away 
from  a  wall,  and  spun  on  the  ground 
to  give  it  a  random  direction,  but  if 
the  angle  u  shown  in  figure  j  doesn’t 
come  out  in  the  range  from  0  to  n/2, 
the  laser  is  spun  again  until  an  an¬ 
gle  in  the  desired  range  is  obtained. 
Find  the  probability  distribution  of  the 
distance  x  shown  in  the  figure.  The 
derivative  dtan-1  z/dz  =  1/(1+z2)  will 
be  required  (see  example  66,  page 
88). 

o  Since  any  angle  between  0  and  n/2 
is  equally  likely,  the  probability  distri¬ 
bution  dP/  du  must  be  a  constant,  and 
normalization  tells  us  that  the  constant 
must  be  dP/du  =  2/7t. 

The  laser  is  one  meter  from  the  wall, 
so  the  distance  x,  measured  in  me¬ 
ters,  is  given  by  x  =  tan  u.  For  the 
probability  distribution  of  x,  we  have 

dP_dP  du 
dx  du  dx 
_  2  dtan^1  x 

7T  dx 
2 

7t(1  +  X2) 

Note  that  the  range  of  possible  values 
of  x  theoretically  extends  from  0  to  in- 
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finity.  Problem  7  on  page  104  deals 
with  this. 

If  the  next  Martian  you  meet  asks 
you,  “How  tall  is  an  adult  hu¬ 
man?,”  you  will  probably  reply 
with  a  statement  about  the  average 
human  height,  such  as  “Oh,  about 
5  feet  6  inches.”  If  you  wanted  to 
explain  a  little  more,  you  could  say, 
“But  that’s  only  an  average.  Most 
people  are  somewhere  between  5 
feet  and  6  feet  tall.”  Without 
bothering  to  draw  the  relevant  bell 
curve  for  your  new  extraterrestrial 
acquaintance,  you’ve  summarized 
the  relevant  information  by  giving 
an  average  and  a  typical  range  of 
variation.  The  average  of  a  prob¬ 
ability  distribution  can  be  defined 
geometrically  as  the  horizontal  po¬ 
sition  at  which  it  could  be  balanced 
if  it  was  constructed  out  of  card¬ 
board,  i.  This  is  a  different  way 
of  working  with  averages  than  the 
one  we  did  earlier.  Before,  had 
a  graph  of  y  versus  x,  we  implic¬ 
itly  assumed  that  all  values  of  x 
were  equally  likely,  and  we  found 
an  average  value  of  y.  In  this  new 
method  using  probability  distribu¬ 
tions,  the  variable  we’re  averaging 
is  on  the  x  axis,  and  the  y  axis  tells 
us  the  relative  probabilities  of  the 
various  x  values. 

For  a  discrete- valued  variable  with 
n  possible  values,  the  average 
would  be 

n 

X  =  ^xP(x), 
i— 0 

and  in  the  case  of  a  continuous 


variable,  this  becomes  an  integral, 


fb  d P  , 

x  — —  da;. 


da; 


Example  61 

>  For  the  situation  described  in  exam¬ 
ple  59,  find  the  average  value  of  x. 


o 


f1  dP  , 
x  —  dx 
n  dx 


=  /  x  ■  2(1  —  x)  dx 
Jo 


=  2  [  (x  -  x2)  dx 
Jo 


1 

3 


Sometimes  we  don’t  just  want  to 
know  the  average  value  of  a  cer¬ 
tain  variable,  we  also  want  to  have 
some  idea  of  the  amount  of  varia¬ 
tion  above  and  below  the  average. 
The  most  common  way  of  measur¬ 
ing  this  is  the  standard  deviation, 
defined  by 


The  idea  here  is  that  if  there  was 
no  variation  at  all  above  or  be¬ 
low  the  average,  then  the  quantity 
(x  —  x)  would  be  zero  whenever 
d P/  da;  was  nonzero,  and  the  stan¬ 
dard  deviation  would  be  zero.  The 
reason  for  taking  the  square  root 
of  the  whole  thing  is  so  that  the 
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result  will  have  the  same  units  as 

x. 

Example  62 

>  For  the  situation  described  in  exam¬ 
ple  59,  find  the  standard  deviation  of 

x. 


o  The  square  of  the  standard  deviation 
is 


\  -,2  dP 

x  -  x)  —  dx 
dx 


:  f\x-  1/3)2  -  2(1 
Jo 


x)dx 


1 

18’ 


so  the  standard  deviation  is 


1 

“  Trl 

«  0.236 


PROBLEMS 
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Problems 

1  Write  a  computer  program 
similar  to  the  one  in  example  53 
on  page  74  to  evaluate  the  definite 
integral 


>  Solution,  p.  188 

2  Evaluate  the  integral 

p2n 

/  sin  x  da:, 

J  o 

and  draw  a  sketch  to  explain  why 
your  result  comes  out  the  way  it 
does.  >  Solution,  p.  188 

3  Sketch  the  graph  that  repre¬ 
sents  the  definite  integral 


and  estimate  the  result  roughly 
from  the  graph.  Then  evaluate  the 
integral  exactly,  and  check  against 
your  estimate. 

>  Solution,  p.  189 

4  Make  a  rough  guess  as  to  the 
average  value  of  sin  a;  for  0  <  x  < 
7 r,  and  then  find  the  exact  result 
and  check  it  against  your  guess. 

>  Solution,  p.  190 

5  Show  that  the  mean  value  the¬ 
orem’s  assumption  of  continuity  is 
necessary,  by  exhibiting  a  discon¬ 
tinuous  function  for  which  the  the¬ 
orem  fails.  >  Solution,  p.  190 

6  Show  that  the  fundamental 
theorem  of  calculus’s  assumption 
of  continuity  for  x  is  necessary,  by 


exhibiting  a  discontinuous  function 
for  which  the  theorem  fails. 

>  Solution,  p.  190 

7  Sketch  the  graphs  of  y  =  x2 
and  y  =  yfx  for  0  <  x  <  1.  Graph¬ 
ically,  what  relationship  should  ex¬ 
ist  between  the  integrals  fQ  x2  Ax 
and  h'  y/x  Ax'!  Compute  both  in¬ 
tegrals,  and  verify  that  the  results 
are  related  in  the  expected  way. 

8  Evaluate  f  y/bxyfx  dx,  where 
b  is  a  constant. 

>  Solution,  p.  190 

9  In  a  gasoline-burning  car  en¬ 
gine,  the  exploding  air-gas  mixture 
makes  a  force  on  the  piston,  and 
the  force  tapers  off  as  the  piston 
expands,  allowing  the  gas  to  ex¬ 
pand.  (a)  In  the  approximation 
F  =  k/x,  where  x  is  the  position 
of  the  piston,  find  the  work  done 
on  the  piston  as  it  travels  from 
x  =  a  to  x  =  b,  and  show  that 
the  result  only  depends  on  the  ra¬ 
tio  b/a.  This  ratio  is  known  as 
the  compression  ratio  of  the  en¬ 
gine.  (b)  A  better  approximation, 
which  takes  into  account  the  cool¬ 
ing  of  the  air-gas  mixture  as  it  ex¬ 
pands,  is  F  =  kx~1A.  Compute 
the  work  done  in  this  case. 


10  A  certain  variable  x  varies 
randomly  from  -1  to  1,  with 
probability  distribution  API  dx  = 
k(l-x2). 

(a)  Determine  k  from  the  require¬ 
ment  of  normalization. 
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a 

1 

T 


l 

1 

Problem  9. 

(b)  Find  the  average  value  of  x. 

(c)  Find  its  standard  deviation. 

11  Suppose  that  we’ve  already 
established  that  the  derivative  of 
an  odd  function  is  even,  and  vice 
versa.  (See  problem  30,  p.  50.) 
Something  similar  can  be  proved 
for  integration.  However,  the  fol¬ 
lowing  is  not  quite  right. 

Let  f  be  even,  and  let  g  = 
/  f(x)  da;  be  its  indefinite  integral. 
Then  by  the  fundamental  theorem 
of  calculus,  f  is  the  derivative  of 
g.  Since  we’ve  already  established 
that  the  derivative  of  an  odd  func¬ 
tion  is  even,  we  conclude  that  g  is 
odd. 

Find  all  errors  in  the  proof. 

>  Solution,  p.  190 

12  A  perfectly  elastic  ball 
bounces  up  and  down  forever,  al¬ 
ways  coming  back  up  to  the  same 
height  h.  Find  its  average  height. 


CHAPTER  4.  INTEGRATION 


Problem  13. 

13  The  figure  shows  a  curve  with 
a  tangent  line  segment  of  length  1 
that  sweeps  around  it,  forming  a 
new  curve  that  is  usually  outside 
the  old  one.  Prove  Holditclr’s  the¬ 
orem,  which  states  that  the  new 
curve’s  area  differs  from  the  old 
one’s  by  7r.  (This  is  an  example 
of  a  result  that  is  much  more  dif¬ 
ficult  to  prove  without  making  use 
of  infinitesimals.)  * 


5  Techniques 


5.1  Newton’s  method 

In  the  1958  science  fiction  novel 
Have  Space  Suit  —  Will 
Travel,  by  Robert  Heinlein,  Kip 
is  a  high  school  student  who  wants 
to  be  an  engineer,  and  his  father  is 
trying  to  convince  him  to  stretch 
himself  more  if  he  wants  to  get  any¬ 
thing  out  of  his  education: 

“Why  did  Van  Buren  fail  of  re- 
election?  How  do  you  extract  the 
cube  root  of  eighty-seven?" 

Van  Buren  had  been  a  president; 
that  was  all  I  remembered.  But  I 
could  answer  the  other  one.  “If 
you  want  a  cube  root,  you  look  in 
a  table  in  the  back  of  the  book.  ’’ 

Dad  sighed.  “Kip,  do  you  think 
that  table  was  brought  down  from 
on  high  by  an  archangel?” 

We  no  longer  use  tables  to  com¬ 
pute  roots,  but  how  does  a  pocket 
calculator  do  it?  A  technique 
called  Newton’s  method  allows  us 
to  calculate  the  inverse  of  any  func¬ 
tion  efficiently,  including  cases  that 
aren’t  preprogrammed  into  a  cal¬ 
culator.  In  the  example  from  the 
novel,  we  know  how  to  calculate 
the  function  y  =  x 3  fairly  accu¬ 
rately  and  quickly  for  any  given 
value  of  x,  but  we  want  to  turn  the 
equation  around  and  find  x  when 
y  =  87.  We  start  with  a  rough 
mental  guess:  since  43  =  64  is  a  lit¬ 


tle  too  small,  and  53  =  125  is  much 
too  big,  we  guess  x  «  4.3.  Test¬ 
ing  our  guess,  we  have  4.33  =  79.5. 
We  want  y  to  get  bigger  by  7.5,  and 
we  can  use  calculus  to  find  approx¬ 
imately  how  much  bigger  x  needs 
to  get  in  order  to  accomplish  that: 


Ax  . 

dx  . 

-T AJ/ 
d  y 

Ay 

Ay/  da: 
Ay 
3a:2 
Ay 

3a;2 


=  0.14 


Increasing  our  value  of  x  to  4.3  + 
0.14  =  4.44,  we  find  that  4.443  = 
87.5  is  a  pretty  good  approxima¬ 
tion  to  87.  If  we  need  higher  preci¬ 
sion,  we  can  go  through  the  process 
again  with  Ay  =  —0.5,  giving 


Aa;  ss 


Ay 

3a;2 


=  0.14 


x  =  4.43 
a:3  =  86.9. 


This  second  iteration  gives  an  ex¬ 
cellent  approximation. 


Example  63 

>  Figure  63  shows  the  astronomer  Jo¬ 
hannes  Kepler’s  analysis  of  the  motion 
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a  /  Example  63. 


and  we  want  to  find  x  when  y  = 
27t/4  =  1 .57.  As  a  first  guess,  we  try 
x  =  7t/2  (90  degrees),  since  the  ec¬ 
centricity  of  Mercury’s  orbit  is  actually 
much  smaller  than  the  example  shown 
in  the  figure,  and  therefore  the  planet’s 
speed  doesn't  vary  all  that  much  as  it 
goes  around  the  sun.  For  this  value  of 
x  we  have  y  =  1 .36,  which  is  too  small 
by  0.21. 


Ax 


~  dy/dx 
_  0.21 
1  -  (0.206)  cosx 
=  0.21 


of  the  planets.  The  ellipse  is  the  or¬ 
bit  of  the  planet  around  the  sun.  At 
t  =  0,  the  planet  is  at  its  closest  ap¬ 
proach  to  the  sun,  A.  At  some  later 
time,  the  planet  is  at  point  B.  The  an¬ 
gle  x  (measured  in  radians)  is  defined 
with  reference  to  the  imaginary  circle 
encompassing  the  orbit.  Kepler  found 
the  equation 


where  the  period,  7,  is  the  time  re¬ 
quired  for  the  planet  to  complete  a  full 
orbit,  and  the  eccentricity  of  the  el¬ 
lipse,  e,  is  a  number  that  measures 
how  much  it  differs  from  a  circle.  The 
relationship  is  complicated  because 
the  planet  speeds  up  as  it  falls  inward 
toward  the  sun,  and  slows  down  again 
as  it  swings  back  away  from  it. 

The  planet  Mercury  has  e  =  0.206. 
Find  the  angle  x  when  Mercury  has 
completed  1/4  of  a  period. 

>  We  have 


(The  derivative  dy/dx  happens  to  be 
1  at  x  =  7t/2.)  This  gives  a  new  value 
of  x,  1 .57+.21  =1 .78.  Testing  it,  we 
have  y  =  1.58,  which  is  correct  to 
within  rounding  errors  after  only  one 
iteration.  (We  were  only  supplied  with 
a  value  of  e  accurate  to  three  signifi¬ 
cant  figures,  so  we  can’t  get  a  result 
with  precision  better  than  about  that 
level.) 

5.2  Implicit 

differentiation 

We  can  differentiate  any  function 
that  is  written  as  a  formula,  and 
find  a  result  in  terms  of  a  formula. 
However,  sometimes  the  original 
problem  can’t  be  written  in  any 
nice  way  as  a  formula.  For  exam¬ 
ple,  suppose  we  want  to  find  Ay/  da’ 
in  a  case  where  the  relationship  be¬ 
tween  x  and  y  is  given  by  the  fol¬ 
lowing  equation: 


y 7  +  y  =  x7  +  x2. 


y  =  x  —  (0.206)  sin  x, 


5.3.  METHODS  OF  INTEGRATION 
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There  is  no  equivalent  of  the 
quadratic  formula  for  seventh- 
order  polynomials,  so  we  have  no 
way  to  solve  for  one  variable  in 
terms  of  the  other  in  order  to  dif¬ 
ferentiate  it.  However,  we  can  still 
find  dy/ dx  in  terms  of  x  and  y. 
Suppose  we  let  x  grow  to  x  +  dx. 
Then  for  example  the  x2  term  will 
grow  to  (x  +  dx)2  =  x  +  2  dx  +  dx2. 
The  squared  infinitesimal  is  negli¬ 
gible,  so  the  increase  in  x2  was  re¬ 
ally  just  2  dx,  and  we’ve  really  just 
computed  the  derivative  of  x2  with 
respect  to  x  and  multiplied  it  by 
dx.  In  symbols, 


5.3  Methods  of 
integration 

Change  of  variable 

Sometimes  an  unfamiliar-looking 
integral  can  be  made  into  a  famil¬ 
iar  one  by  substituting  a  new  vari¬ 
able  for  an  old  one.  For  exam¬ 
ple,  we  know  how  to  integrate  1/x 
the  answer  is  lnx  —  but  what 
about 

dx  ? 

2x  +  1 


d(x2) 


d(x2) 

dx 

2x  dx. 


•  dx 


Let  u  =  2x  +  1.  Differentiating 
both  sides,  we  have  du  =  2dx,  or 
dx  =  du/2,  so 


That  is,  the  change  in  x2  is  2x 
times  the  change  in  x.  Doing  this 
to  both  sides  of  the  original  equa¬ 
tion,  we  have 

<Xy7  +  y)  =  d(x7  +  x2) 

7 y6  dy  +  1  dy  =  7x6  dx  +  2x  dx 
(7 y6  +  1)  dy  =  (7x6  +  2x)  dx 
dy  7x6  +  2x 

dx  7  y6  +  1 

This  still  doesn’t  give  us  a  for¬ 
mula  for  the  derivative  in  terms  of 
x  alone,  but  it’s  not  entirely  use¬ 
less.  For  instance,  if  we’re  given 
a  numerical  value  of  x,  we  can  al¬ 
ways  use  Newton’s  method  to  find 
y,  and  then  evaluate  the  derivative. 


dx  !  du/2 

2x  +  1  J  u 

=  -  In  u  +  c 
=  -  ln(2x  +  1)  +  c. 

This  technique  is  known  as  a 
change  of  variable  or  a  substitu¬ 
tion.  (Because  the  letter  u  is  of¬ 
ten  employed,  you  may  also  see  it 
called  ^-substitution.) 

In  the  case  of  a  definite  integral, 
we  have  to  remember  to  change  the 
limits  of  integration  to  reflect  the 
new  variable. 

Example  64 

>  Evaluate  /34  dx/(2x  +  1). 
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>  As  before,  let  u  =  2x  +  1 . 


dx 

2x  +  1 


r»i/=9 


du/2 


/u=7  U 
^  iu=9 


=  g  lnu 


Here  the  notation  |":®  means  to  eval¬ 
uate  the  function  at  7  and  9,  and  sub¬ 
tract  the  former  from  the  latter.  The 
result  is 


dX  1  /l  r,  , 

=  -(In  9  —  In  7) 


2x  +  1 


1  ,  9 

-2  lnr 


any  hope  of  working.  The  follow¬ 
ing  is  a  little  more  dastardly. 


I 

>  Evaluate 


Example  66 


/ 


dx 

1  +  x2 ' 


t>  The  substitution  that  works  is  x  = 
tan  u.  First  let’s  see  what  this  does 
to  the  expression  1  +  x2.  The  familiar 
identity 


sin2  u  +  cos2  u  =  1 , 
when  divided  by  cos2  u,  gives 


Sometimes,  as  in  the  next  example, 
a  clever  substitution  is  the  secret  to 
doing  a  seemingly  impossible  inte¬ 
gral. 


>  Evaluate 


Example  65 


/ 


>  The  only  hope  for  reducing  this  to  a 
form  we  can  do  is  to  let  u  =  y/x.  Then 
dx  =  d (u2)  =  2udu,  so 

e'A  f  eu 

dx=  /  —  -2udu 

yfx  J  u 

=  2  J  eu  6u 
=  2eu 


tan2  u  +  1  =  sec2  u, 

so  1  +  x2  becomes  sec2  u.  But  differ¬ 
entiating  both  sides  of  x  =  tan  u  gives 

dx  =  d  ^sin  u(cos  u)_1  j 
=  (dsin  u)( cos  u)~' 

+  (sin  u)  d  ^(cos  u)_1j 
=  ^1  +  tan2  uj  du 
=  sec2  udu, 

so  the  integral  becomes 

r  dx  _  r  sec2  udu 
J  1  +  x2  ~  J  sec2  u 
=  u  +  c 
=  tan-1  x  +  c. 


=  26^. 


Example  65  really  isn’t  so  tricky, 
since  there  was  only  one  logical 
choice  for  the  substitution  that  had 


What  mere  mortal  would  ever 
have  suspected  that  the  substitu¬ 
tion  x  =  tan  u  was  the  one  that 
was  needed  in  example  66?  One 
possible  answer  is  to  give  up  and 
do  the  integral  on  a  computer: 
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Integrate(x)  l/(l+x~2) 
ArcTan(x) 

Another  possible  answer  is  that 
you  can  usually  smell  the  pos¬ 
sibility  of  this  type  of  substitu¬ 
tion,  involving  a  trig  function, 
when  the  thing  to  be  integrated 
contains  something  reminiscent  of 
the  Pythagorean  theorem,  as  sug¬ 
gested  by  figure  b.  The  1  +  x2 
looks  like  what  you’d  get  if  you 
had  a  right  triangle  with  legs  1  and 
x,  and  were  using  the  Pythagorean 
theorem  to  find  its  hypotenuse. 


b  /  The  substitution  x  = 
tan  u. 


x 

c  /  The  substitution  x  = 
cos  u. 

Integration  by  parts 

Figure  cl  shows  a  technique  called 
integration  by  parts.  If  the  inte¬ 
gral  f  v  dw  is  easier  than  the  inte¬ 
gral  /  udv,  then  we  can  calculate 
the  easier  one,  and  then  by  sim¬ 
ple  geometry  determine  the  one  we 
wanted.  Identifying  the  large  rect¬ 
angle  that  surrounds  both  shaded 
areas,  and  the  small  white  rectan¬ 
gle  on  the  lower  left,  we  have 

J  udv  =(area  of  large  rectangle) 

—  (area  of  small  rectangle) 
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u 


t>  There  are  two  obvious  possibilities 
for  splitting  up  the  integrand  into  fac¬ 
tors, 

udv  =  (x)  (cos  xdx) 

or 

u6v  =  (cosx)(xdx). 

The  first  one  is  the  one  that  lets  us 
make  progress.  If  u  =  x,  then  du  =  dx, 
and  if  dv  =  cosxdx,  then  integration 
gives  v  =  sinx. 


Since  a  definite  integral  can  al¬ 
ways  be  done  by  evaluating  an  in¬ 
definite  integral  at  its  upper  and 
lower  limits,  one  usually  uses  this 
form.  Integrals  don’t  usually  come 
prepackaged  in  a  form  that  makes 
it  obvious  that  you  should  use  inte¬ 
gration  by  parts.  What  the  equa¬ 
tion  for  integration  by  parts  tells 
us  is  that  if  we  can  split  up  the 
integrand  into  two  factors,  one  of 
which  (the  dt;)  we  know  how  to 
integrate,  we  have  the  option  of 
changing  the  integral  into  a  new 
form  in  which  that  factor  becomes 
its  integral,  and  the  other  fac¬ 
tor  becomes  its  derivative.  If  we 
choose  the  right  way  of  splitting  up 
the  integrand  into  parts,  the  result 
can  be  a  simplification. 


>  Evaluate 


Example  68 


x  cosxdx 


xcosxdx  =  J  udv 

=  uv  -  J  vdu 

=  xsinx  —  J  sinxdx 
=  xsinx  +  cosx 


Of  the  two  possibilities  we  consid¬ 
ered  for  u  and  di/,  the  reason  this 
one  helped  was  that  differentiating  x 
gave  dx,  which  was  simpler,  and  in¬ 
tegrating  cosxdx  gave  sinx,  which 
was  no  more  complicated  than  be¬ 
fore.  The  second  possibility  would 
have  made  things  worse  rather  than 
better,  because  integrating  xdx  would 
have  given  x2/2,  which  would  have 
been  more  complicated  rather  than 
less. 


Example  69 

o  Evaluate  f  In  xdx. 

o  This  one  is  a  little  tricky,  because  it 
isn’t  explicitly  written  as  a  product,  and 
yet  we  can  attack  it  using  integration 


5.3.  METHODS  OF  INTEGRATION 


91 


by  parts.  Let  u  =  In  x  and  d  v  =  dx.  Partial  fractions 


Inxdx  : 


utiv 


=  uv  -  /  i/d u 


=  xlnx  —  /  x 


dx 


:  xlnx  —  x 


Example  70 

>  Evaluate  /  x2ex  dx. 

>  Integration  by  parts  lets  us  split 
the  integrand  into  two  factors,  inte¬ 
grate  one,  differentiate  the  other,  and 
then  do  that  integral.  Integrating  or 
differentiating  ex  does  nothing.  In¬ 
tegrating  x2  increases  the  exponent, 
which  makes  the  problem  look  harder, 
whereas  differentiating  x2  knocks  the 
exponent  down  a  step,  which  makes 
it  look  easier.  Let  u  =  x2  and  dv  = 
exdx,  so  that  du  =  2xdx  and  v  =  ex. 
We  then  have 

/xVdx-xV-2  J  xex  dx. 

Although  we  don't  immediately  know 
how  to  evaluate  this  new  integral,  we 
can  subject  it  to  the  same  type  of  inte¬ 
gration  by  parts,  now  with  u  =  x  and 
dv  =  exdx.  After  the  second  integra¬ 
tion  by  parts,  we  have: 


x2exdx  =  x2ex  -  2 


=  x2ex  -  2  (xex  -  ex) 
=  (x2  -  2x  +  2)ex 


Given  a  function  like 

-1  1 

- 1  ”l - ZT’ 

we  can  rewrite  it  over  a  common 
denominator  like  this: 


x  +  1 

X  +1 


X  —  1 

x  —  1 


— x  —  1  +  x  —  1 
(x  -  l)(a:  +  1) 
-2 

x2  —  1  ’ 


But  note  that  the  original  form  is 
easily  integrated  to  give 


=  —  ln(a;  —  1)  +  ln(a:  +  1)  +  c, 

while  faced  with  the  form 
— 2/(x2  —  1),  we  wouldn’t  have 
known  how  to  integrate  it. 

Note  that  the  original  function  was 
of  the  form  (—1)/...  +  (+1)/... 
It’s  not  a  coincidence  that  the  two 
constants  on  top,  —1  and  +1,  are 
opposite  in  sign  but  equal  in  abso¬ 
lute  value.  To  see  why,  consider 
the  behavior  of  this  function  for 
large  values  of  x.  Looking  at  the 
form  —l/(x  —  1)  +  l/(x  +  1),  we 
might  naively  guess  that  for  a  large 
value  of  x  such  as  1000,  it  would 
come  out  to  be  somewhere  on  the 
order  thousandths.  But  looking  at 
the  form  —2/(x2  —  1),  we  would 
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expect  it  to  be  way  down  in  the 
millionths.  This  seeming  paradox 
is  resolved  by  noting  that  for  large 
values  of  x,  the  two  terms  in  the 
form  —  l/(x  —  1)  +  l/(x  +  1)  very 
nearly  cancel.  This  cancellation 
could  only  have  happened  if  the 
constants  on  top  were  opposites 
like  plus  and  minus  one. 

The  idea  of  the  method  of  partial 
fractions  is  that  if  we  want  to  do 
an  integral  of  the  form 

dx 

P(x)' 

where  P(x)  is  an  nth  order  polyno¬ 
mial,  we  rewrite  1/P  as 

1  _  A\  A.n 

P{x)  x  —  r i  x  —  rn  ’ 

where  ry  . . .  rn  are  the  roots  of  the 
polynomial,  i.e. ,  the  solutions  of 
the  equation  P(r)  =  0.  If  the  poly¬ 
nomial  is  second-order,  you  can 
find  the  roots  ry  and  r2  using 
the  quadratic  formula;  I’ll  assume 
for  the  time  being  that  they’re 
real.  For  higher-order  polynomi¬ 
als,  there  is  no  surefire,  easy  way 
of  finding  the  roots  by  hand,  and 
you’d  be  smart  simply  to  use  com¬ 
puter  software  to  do  it.  In  Yacas, 
you  can  find  the  real  roots  of  a 
polynomial  like  this: 

FindRealRoots (x~4-5*x~3 
-25*x~2+65*x+84) 

{3.  ,7.  ,~4-  ,~1-} 

(I  assume  it  uses  Newton’s  method 
to  find  them.)  The  constants  A,; 


can  then  be  determined  by  algebra, 
or  by  the  following  trick. 

Numerical  method 

Suppose  we  evaluate  1/P(a;)  for  a 
value  of  x  very  close  to  one  of  the 
roots.  In  the  example  of  the  poly¬ 
nomial  x4  —  5a;3  —  25a;2  +  65a;  + 
84,  let  ?*i  . . .  7*4  be  the  roots  in 
the  order  in  which  they  were  re¬ 
turned  by  Yacas.  Then  A\  can 
be  found  by  evaluating  1/P(a;)  at 
x  =  3.000001: 

P(x) :=x~4-5*x~3-25*x~2 
+65*x+84 
N(l/P(3 . 000001) ) 

-8928. 5702094768 

We  know  that  for  x  very  close  to 
3,  the  expression 

1  _  A\  +  A2  +  A3  A4 
P  x  —  3  x —  7  x  +  4  x  +  l 

will  be  dominated  by  the  A\  term, 
so 

—8930  « - — - 

3.000001  -  3 

A!  «  (— 8930)(10-6). 

By  the  same  method  we  can  find 
the  other  four  constants: 

dx:=. 000001 
N(l/P(7+dx) ,30)*dx 
0 .2840908276 e-2 
N(l/P(-4+dx) , 30)*dx 
-0. 4329006 192e-2 
N(l/P(-l+dx) , 30)*dx 
0.1041666664e-l 
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(The  N(  ,30)  construct  is  to  tell 
Yacas  to  do  a  numerical  calcula¬ 
tion  rather  than  an  exact  symbolic 
one,  and  to  use  30  digits  of  pre¬ 
cision,  in  order  to  avoid  problems 
with  rounding  errors.)  Thus, 

1  _  -8.93  x  1CT3 
P  ~  x-3 

2.84  x  1CT3 
+  x-7 

4.33  x  1CT3 
x  +  4 

1.04  x  1CT2 
H - : - • 

X  +  1 

The  desired  integral  is 
f  die 

/  =  —8.93  x  10-3  ln(a;  —  3) 

J  P(x) 

+  2.84  x  10~3  In  (a:  —  7) 
-4.33  x  10-3  In  (a:  +  4) 

+  1.04  x  l(r2ln(a;  +  1) 

+  c. 

As  in  the  simpler  example  I  started 
off  with,  where  P  was  second  or¬ 
der  and  we  got  A\  =  —  A2,  in  this 
11  =  4  example  we  expect  that 
A\  +  A‘2  +  A3  +  A4  —  0,  for  oth¬ 
erwise  the  large- x  behavior  of  the 
partial-fraction  form  would  be  1  / x 
rather  than  1/a:4.  This  is  a  useful 
way  of  checking  the  result:  —8.93+ 
2.84  -  4.33  +  10.4  =  -.02  «  0. 

Complications 

There  are  two  possible  complica¬ 
tions: 


First,  the  same  factor  may  occur 
more  than  once,  as  in  x 3  —  5x2  + 
7x  —  3  =  (a;  —  l)(or  —  l)(x  —  3).  In 
this  example,  we  have  to  look  for 
an  answer  of  the  form  A/ (x  —  1)  + 
B/(x  —  \)2  +  C /{x  —  3),  the  solution 
being  —  .25 /{x  —  1)  —  .5/{x  —  l)2  + 
.25/ (x  —  3). 

Second,  the  roots  may  be  complex. 
This  is  no  show-stopper  if  you’re 
using  computer  software  that  han¬ 
dles  complex  numbers  gracefully. 
(You  can  choose  a  c  that  makes  the 
result  real.)  In  fact,  as  discussed  in 
section  8.3,  some  beautiful  things 
can  happen  with  complex  roots. 
But  as  an  alternative,  any  polyno¬ 
mial  with  real  coefficients  can  be 
factored  into  linear  and  quadratic 
factors  with  real  coefficients.  For 
each  quadratic  factor  Q(x),  we 
then  have  a  partial  fraction  of  the 
form  (A  +  Bx) /Q(x),  where  A  and 
B  can  be  determined  by  algebra. 
In  Yacas,  this  can  be  done  using 
the  Apart  function. 

Example  71 

t>  Evaluate  the  integral 

_ dx _ 

(x4  —  8x3  +  8x2  -  8x  +  7 

using  the  method  of  partial  fractions. 

t>  First  we  use  Yacas  to  look  for  real 
roots  of  the  polynomial: 

FindRealRoots (x~4-8*x~3 
+8*x"2-8*x+7) 

Unfortunately  this  polynomial  seems 
to  have  only  two  real  roots;  the  rest 
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are  complex.  We  can  divide  out  the 
factor  (x  -  1)(x  -  7),  but  that  still 
leaves  us  with  a  second-order  polyno¬ 
mial,  which  has  no  real  roots.  One  ap¬ 
proach  would  be  to  factor  the  polyno¬ 
mial  into  the  form  (x  —  1)(x  —  7)(x  — 
p)(x-q),  where  p  and  q  are  complex, 
as  in  section  8.3.  Instead,  let’s  use  Ya- 
cas  to  expand  the  integrand  in  terms 
of  partial  fractions: 

Apart (1/ (x~4-8*x~3 
+8*x~2-8*x+7) ) 

(  (2*x)  /25+3/50)  /  (x~2+l) 
+l/(300*(x-7)) 

+  (-l)/(12*(x-l)) 

We  can  now  rewrite  the  integral  like 
this: 


2  r  xdx 
25  J  x2  +  1 

3  f  dx 
+  50  ./  x2  +  1 

1  f  dx 
+  300  J  x-7 
1  r  dx 
12./  x-1 


In  fact,  Yacas  should  be  able  to  do 
the  whole  integral  for  us  from  scratch, 
but  it's  best  to  understand  how  these 


things  work  under  the  hood,  and  to 
avoid  being  completely  dependent  on 
one  particular  piece  of  software.  As 
an  illustration  of  this  gem  of  wisdom, 
I  found  that  when  I  tried  to  make  Ya¬ 
cas  evaluate  the  integral  in  one  gulp, 
it  choked  because  the  calculation  be¬ 
came  too  complicated!  Because  I  un¬ 
derstood  the  ideas  behind  the  proce¬ 
dure,  I  was  still  able  to  get  a  result 
through  a  mixture  of  computer  calcu¬ 
lations  and  working  it  by  hand.  Some¬ 
one  who  didn't  have  the  knowledge  of 
the  technique  might  have  tried  the  in¬ 
tegral  using  the  software,  seen  it  fail, 
and  concluded,  incorrectly,  that  the  in¬ 
tegral  was  one  that  simply  couldn’t  be 
done.  A  computer  is  no  substitute  for 
understanding. 


Residue  method 

On  p.  92  I  introduced  the  trick  of 
carrying  out  the  method  of  par¬ 
tial  fractions  by  evaluating  1/P(x) 
numerically  at  x  =  r*  +  e,  near 
where  1/P  blows  up.  Sometimes 
we  would  like  to  have  an  exact  re¬ 
sult  rather  than  a  numerical  ap¬ 
proximation.  We  can  accomplish 
this  by  using  an  infinitesimal  num¬ 
ber  da:  rather  than  a  small  but  fi¬ 
nite  e.  For  simplicity,  let’s  assume 
that  all  of  the  n  roots  r,  are  dis¬ 
tinct,  and  that  P’s  highest-order 
term  is  xn .  We  can  then  write  P 
as  the  product  P(x)  =  (x  —  ri)(x  — 
r?)  ...  (x  —  rn).  For  products  like 
this,  there  is  a  notation  II  (capital 
Greek  letter  “pi”)  that  works  like 
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E  does  for  sums: 

n 

P(x)  =  Ykx-U). 

i= 1 

It’s  not  necessary  that  the  roots  be 
real,  but  for  now  we  assume  that 
they  are.  We  want  to  find  the  co¬ 
efficients  Ai  such  that 


was  found  numerically  to  be  A  « 
—8.930  x  10-3.  Determine  it  exactly 
using  the  residue  method. 

i>  Differentiation  gives  P'(x)  =  4x3  — 
15x2  —  50x  +  65.  We  then  have  A  = 
1/P'(3)  =  -1/112. 

Integrals  that  can’t  be  done 


—  =  y 

P(x)  ^ 


A, 

X  —  Ti 


We  then  have 


1 


P(ri  +  dx) 

1 


d;c  II j9a(n  -  rj  +  dx) 

1 


da:n  j&in-rj) 

=  —  + 

dx 


+  ... 


where  . . .  represents  finite  terms 
that  are  negligible  compared  to  the 
infinite  ones.  Multiplying  on  both 
sides  by  dx,  we  have 


1 


P'{n) 


. . .  =  Ai 


where  the  . . .  now  stand  for  in¬ 
finitesimals  which  must  in  fact  can¬ 
cel  out,  since  both  Ai  and  1/P'  are 
real  numbers. 

Example  72 

o  The  partial-fraction  decomposition 
of  the  function 

1 

x4  -  5x3  -  25x2  +  65x  +  84 

was  found  numerically  on  p.  92.  The 
coefficient  of  the  1  /(x  —  3)  term 


Integral  calculus  was  invented  in 
the  age  of  powdered  wigs  and  harp¬ 
sichords,  so  the  original  emphasis 
was  on  expressing  integrals  in  a 
form  that  would  allow  numbers  to 
be  plugged  in  for  easy  numerical 
evaluation  by  scribbling  on  scraps 
of  parchment  with  a  quill  pen. 
This  was  an  era  when  you  might 
have  to  travel  to  a  large  city  to  get 
access  to  a  table  of  logarithms. 

In  this  computationally  impov¬ 
erished  environment,  one  always 
wanted  to  get  answers  in  what’s 
known  as  closed  form  and  in  terms 
of  elementary  functions. 

A  closed  form  expression  means 
one  written  using  a  finite  num¬ 
ber  of  operations,  as  opposed  to 
something  like  the  geometric  series 
1  +  x  +  x2  +  x3  + . . .,  which  goes  on 
forever. 

Elementary  functions  are  usually 
taken  to  be  addition,  subtraction, 
multiplication,  division,  logs,  and 
exponentials,  as  well  as  other  func¬ 
tions  derivable  from  these.  For  ex¬ 
ample,  a  cube  root  is  allowed,  since 
ffx  =  e^1/3)111*,  and  so  are  trig 
functions  and  their  inverses,  since, 
as  we  will  see  in  chapter  8,  they 
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can  be  expressed  in  terms  of  logs 
and  exponentials. 

In  theory,  “closed  form”  doesn’t 
mean  anything  unless  we  state  the 
elementary  functions  that  are  al¬ 
lowed.  In  practice,  when  people 
refer  to  closed  form,  they  usually 
have  in  mind  the  particular  set 
of  elementary  functions  described 
above. 

A  traditional  freshman  calculus 
course  spends  such  a  vast  amount 
of  time  teaching  you  how  to  do  in¬ 
tegrals  in  closed  form  that  it  may 
be  easy  to  miss  the  fact  that  this 
is  impossible  for  the  vast  majority 
of  integrands  that  you  might  ran¬ 
domly  write  down.  Here  are  some 
examples  of  impossible  integrals: 


J  e~x2  dx 

xx  dx 

sinx 

- ax 

x 

J ex tan  x dx 

The  first  of  these  is  a  form  that 
is  extremely  important  in  statis¬ 
tics  (it  describes  the  area  under  the 
standard  “bell  curve” ) ,  so  you  can 
see  that  impossible  integrals  aren’t 
just  obscure  things  that  don’t  pop 
up  in  real  life. 

People  who  are  proficient  at  doing 
integrals  in  closed  form  generally 


seem  to  work  by  a  process  of  pat¬ 
tern  matching.  They  recognize  cer¬ 
tain  integrals  as  being  of  a  form 
that  can’t  be  done,  so  they  know 
not  to  try. 

Example  73 

o  Students!  Stand  at  attention! 
You  will  now  evaluate  f  e~x  +7xdx  in 
closed  form. 

o  No  sir,  I  can’t  do  that.  By  a  change  of 
variables  of  the  form  u  =  x  +  c,  where 

c  is  a  constant,  we  could  clearly  put 

2 

this  into  the  form  f  e~x  dx,  which  we 
know  is  impossible. 

Sometimes  an  integral  such  as 
J  e~x  dx  is  important  enough  that 
we  want  to  give  it  a  name,  tab¬ 
ulate  it,  and  write  computer  sub¬ 
routines  that  can  evaluate  it  nu¬ 
merically.  For  example,  statisti¬ 
cians  define  the  “error  function” 
erf(x)  =  {2/yJi r)  J  e~x2 dx.  Some¬ 
times  if  you’re  not  sure  whether  an 
integral  can  be  done  in  closed  form, 
you  can  put  it  into  computer  soft¬ 
ware,  which  will  tell  you  that  it 
reduces  to  one  of  these  functions. 
You  then  know  that  it  can’t  be 
done  in  closed  form.  For  exam¬ 
ple,  if  you  ask  the  popular  web  site 
integrals.com  to  do  J  e~x  +7x  dx, 
it  spits  back  (l/2)e49/4v/rrerf(x  — 
7/2).  This  tells  you  both  that 
you  shouldn’t  be  wasting  your  time 
trying  to  do  the  integral  in  closed 
form  and  that  if  you  need  to  evalu¬ 
ate  it  numerically,  you  can  do  that 
using  the  erf  function. 

As  shown  in  the  following  example, 
just  because  an  indefinite  integral 
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can’t  be  done,  that  doesn’t  mean 
that  we  can  never  do  a  related  def¬ 
inite  integral. 

Example  74 

>  Evaluate  /0"/2  e~ tan2  x(tan2  x  +  1 )  dx. 

>  The  obvious  substitution  to  try  is  u  = 
tanx,  and  this  reduces  the  integrand 
to  e~x  .  This  proves  that  the  corre¬ 
sponding  indefinite  integral  is  impos¬ 
sible  to  express  in  closed  form.  How¬ 
ever,  the  definite  integral  can  be  ex¬ 
pressed  in  closed  form;  it  turns  out  to 
be  s/n/2.  The  trick  for  proving  this  is 
given  in  example  99  on  p.  134. 

Sometimes  computer  software 
can’t  say  anything  about  a  partic¬ 
ular  integral  at  all.  That  doesn’t 
mean  that  the  integral  can’t  be 
done.  Computers  are  stupid, 
and  they  may  try  brute-force 
techniques  that  fail  because  the 
computer  runs  out  of  memory 
or  CPU  time.  For  example,  the 
integral  f  da’/(a;10000  —  1)  (prob¬ 
lem  15,  p.  127)  can  be  done  in 
closed  form  using  the  techniques 
of  chapter  8,  and  it’s  not  too  hard 
for  a  proficient  human  to  figure 
out  how  to  attack  it,  but  every 
computer  program  I’ve  tried  it  on 
has  failed  silently. 
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Problems 


1  Graph  the  function  y  =  ex  — 
7x  and  get  an  approximate  idea  of 
where  any  of  its  zeroes  are  (i.e.,  for 
what  values  of  x  we  have  y(x)  =  0). 
Use  Newton’s  method  to  find  the 
zeroes  to  three  significant  figures  of 
precision. 

2  The  relationship  between  x  and 
y  is  given  by  xy  =  siny  +  x2y2 . 

(a)  Use  Newton’s  method  to  find 
the  nonzero  solution  for  y  when 
x  =  3.  Answer:  y  =  0.2231 

(b)  Find  dy/ dx  in  terms  of  x  and 
y,  and  evaluate  the  derivative  at 
the  point  on  the  curve  you  found  in 
part  a.  Answer:  dy/ dx  =  —0.0379 

Based  on  an  example  by  Craig  B. 
Watkins. 

3  Suppose  you  want  to  evaluate 

I '  dx 

J  1  +  sin  2x  ’ 

and  you’ve  found 


Evaluate 


! a  —  x  dx. 


Evaluate 


bx 2  dx, 


where  b  is  a  constant. 
8  Evaluate 


Evaluate 


xex  dx. 


10  Use  integration  by  parts  to 
evaluate  the  following  integrals. 

f  sin-1  x  dx 


1  +  sin  x 


=  —  tan  —  —  - 


/  7T  X\ 

V  4  ~  2  J 


in  a  table  of  integrals.  Use  a 
change  of  variable  to  find  the  an¬ 
swer  to  the  original  problem. 


4  Evaluate 


sin  x  dx 
1  +  cos  x ' 


5  Evaluate 


sin  x  dx 
1  +  cos2  x 


cos  1  x  dx 


tan  1  x dx 


1 1  Evaluate 

J  x2  sinxdx. 

Hint:  Use  integration  by  parts 
more  than  once. 

12  Evaluate 


x2  —  x  —  6 
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13  Evaluate 

dx 

x3  +  3x2  —  4 ' 

14  Evaluate 

dx 

x3  —  x2  +  4x  —  4 

15  Apply  integration  by  parts 
twice  to 

J  e~x  cos  a;  da:, 

examine  what  happens,  and  ma¬ 
nipulate  the  result  in  order  to  solve 
the  original  integral.  (An  approach 
that  doesn’t  rely  on  tricks  is  given 
in  example  91  on  p.  123.) 

16  Plan,  but  do  not  actually 
carry  out  the  steps  that  would  be 
required  in  order  to  generalize  the 
result  of  example  70  on  p.  91  in  or¬ 
der  to  evaluate 

xab~x  dx, 

where  a  and  b  are  constants. 
Which  is  easier,  the  generalization 
from  2  to  a,  or  the  one  from  e  to 
6?  Do  we  need  to  introduce  any  re¬ 
strictions  on  a  or  6? 

>  Solution,  p.  191 

17  The  integral  J  e~x  dx  can’t 
be  done  in  closed  form.  Knowing 
this,  use  a  change  of  variable  to 
write  down  a  different  integral  that 


also  can’t  be  done  in  closed  form. 

18  Consider  the  integral 

J  exP  dx, 

where  p  is  a  constant.  There  is  an 
obvious  substitution.  If  this  is  to 
result  in  an  integral  that  can  be 
evaluated  in  closed  form  by  a  se¬ 
ries  of  integrations  by  parts,  what 
are  the  possible  values  of  pi  Don’t 
actually  complete  the  integral;  just 
determine  what  values  of  p  will 
work.  [>  Solution,  p.  191 

19  Evaluate  the  hundredth 
derivative  of  the  function 

(x2  +  l)/(x3  —  x)  using  paper  and 
pencil.  [Vladimir  Arnol’d] 

>  Solution,  p.  191  ★ 
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6  Improper  integrals 


6.1  Integrating  a 
function  that 
blows  up 

When  we  integrate  a  function  that 
blows  up  to  infinity  at  some  point 
in  the  interval  we’re  integrating, 
the  result  may  be  either  finite  or 
infinite. 

Example  75 

>  Integrate  the  function  y  =  1  /y/x 
from  x  =  0  to  x  =  1 . 


>  The  function  blows  up  to  infinity  at 
one  end  of  the  region  of  integration, 
but  let’s  just  try  evaluating  it,  and  see 
what  happens. 

/V1/2dx  =  2*1/2!1 

Jo  lo 

=  2 


The  result  turns  out  to  be  finite.  In¬ 
tuitively,  the  reason  for  this  is  that  the 
spike  at  x  =  0  is  very  skinny,  and  gets 
skinny  fast  as  we  go  higher  and  higher 
up. 


1 


a  /  The  integral 

fo  dx/ybf  is  finite. 

Example  76 

>  Integrate  the  function  y  =  1  /x2  from 
x  =  0  to  x  =  1 . 

> 

r1  _o  .pi 

x  dx  =  — x 

Jo  lo 


Division  by  zero  is  undefined,  so  the 
result  is  undefined. 

Another  way  of  putting  it,  using  the  hy- 
perreal  number  system,  is  that  if  we 
were  to  integrate  from  e  to  1 ,  where  e 
was  an  infinitesimal  number,  then  the 
result  would  be  — 1  + 1  /e,  which  is  infi¬ 
nite.  The  smaller  we  make  e,  the  big¬ 
ger  the  infinite  result  we  get  out. 

Intuitively,  the  reason  that  this  integral 
comes  out  infinite  is  that  the  spike  at 
x  =  0  is  fat,  and  doesn't  get  skinny 
fast  enough. 


101 


102 


CHAPTER  6.  IMPROPER  INTEGRALS 


> 


dx  =  — x  1 


H 

1 


1 

H 


+  1 


As  H  gets  bigger  and  bigger,  the  re¬ 
sult  gets  closer  and  closer  to  1 ,  so  the 
result  of  the  improper  integral  is  1 . 


b  /  The  integral  J0’  dx/x2 
is  infinite. 

These  two  examples  were  examples 
of  improper  integrals. 

6.2  Limits  of 
integration  at 
infinity 

Another  type  of  improper  integral 
is  one  in  which  one  of  the  limits  of 
integration  is  infinite.  The  nota¬ 
tion 


means  the  limit  of  f H  f(x)  dx, 
where  H  is  made  to  grow  big¬ 
ger  and  bigger.  Alternatively,  we 
can  think  of  it  as  an  integral  in 
which  the  top  end  of  the  interval 
of  integration  is  an  infinite  hyper- 
real  number.  A  similar  interpreta¬ 
tion  applies  when  the  lower  limit  is 
—oo,  or  when  both  limits  are  infi¬ 
nite. 

Example  77 

o  Evaluate 

POO 

/  x  2  dx 


Note  that  this  is  the  same  graph  as 
in  example  75,  but  with  the  x  and  y 
axes  interchanged;  this  shows  that  the 
two  different  types  of  improper  inte¬ 
grals  really  aren’t  so  different. 


c  /  The  integral 

J\°°  dx/x2  is  finite. 


Example  78 

o  Newton's  law  of  gravity  states  that 
the  gravitational  force  between  two 
objects  is  given  by  F  =  Gmim2/r2, 
where  G  is  a  constant,  m\  and  m2 
are  the  objects’  masses,  and  r  is 
the  center-to-center  distance  between 
them.  Compute  the  work  that  must  be 
done  to  take  an  object  from  the  earth’s 
surface,  at  r  =  a,  and  remove  it  to 
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> 


W  = 


Gmim2 


dr 


=  Gmi  m2 

J  a 

=  —Gm^m2 


r  2  dr 


Gm i  m2 


The  answer  is  inversely  proportional 
to  a.  In  other  words,  if  we  were  able  to 
start  from  higher  up,  less  work  would 
have  to  be  done. 
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Problems 

1  Integrate 

e~x  da-, 

or  show  that  it  diverges. 

2  Integrate 

r  f. 

or  show  that  it  diverges. 

3  Integrate 


or  show  that  it  diverges. 

4  Integrate 

x22~x  dx, 

or  show  that  it  diverges. 

>  Solution,  p.  191 


(b)  Find  the  average  value  of  x,  or 
show  that  it  diverges. 

(c)  Find  the  standard  deviation  of 
x,  or  show  that  it  diverges. 

8  Prove 

e~xxn  d x  =  n\. 


Vc 


5  Integrate 

e~x  cos  redo: 

or  show  that  it  diverges.  (Problem 
15  on  p.  99  suggests  a  trick  for  do¬ 
ing  the  indefinite  integral.) 

6  Prove  that 

e~e  dx 

converges,  but  don’t  evaluate  it. 


7  (a)  Verify  that  the  probability 

distribution  d P /  dx  given  in  exam¬ 
ple  60  on  page  80  is  properly  nor¬ 
malized. 


7  Sequences  and  Series 


7.1  Infinite 
sequences 

Consider  an  infinite  sequence  of 
numbers  like  1/2,  2/3,  3/4,  4/5, 
. . .  We  want  to  define  this  as  ap¬ 
proaching  1,  or  “converging  to  1.” 
The  way  to  do  this  is  to  make  a 
function  f(n),  which  is  only  well 
defined  for  integer  values  of  n. 
Then  /( 1)  =  1/2,  /( 2)  =  2/3,  and 
in  general  f[n)  =  n/(n+  1).  With 
just  a  little  tinkering,  our  defini¬ 
tions  of  limits  can  be  applied  to 
this  type  of  function  (see  problem 
1  on  page  114). 

7.2  Infinite  series 

A  related  question  is  how  to  rigor¬ 
ously  define  the  sum  of  infinitely 
many  numbers,  which  is  referred 
to  as  an  infinite  series.  An  exam¬ 
ple  is  the  geometric  series  l  +  x  + 
x2  +  x3  +  . . .  =  1/(1  —  x),  which 
we  used  casually  on  page  29.  The 
general  concept  of  an  infinite  series 
goes  back  to  ancient  Greek  math¬ 
ematics.  Various  supposed  para¬ 
doxes  about  infinite  series,  such  as 
Zeno’s  paradox,  were  exhibited,  in¬ 
fluencing  Euclid  to  sidestep  the  is¬ 
sue  in  his  Elements ,  where  in  Book 
IX,  Proposition  35  he  provides  only 
an  expression  (1  —  xn)/{\  —  x)  for 
the  nth  partial  sum  of  the  geo¬ 
metric  series.  The  case  where  n 
gets  so  big  that  xn  becomes  neg¬ 


ligible  is  left  to  the  reader’s  imag¬ 
ination,  as  in  one  of  those  scenes 
in  a  romance  novel  that  ends  with 
something  like  “...and  she  surren¬ 
dered...”  For  those  with  modern 
training,  the  idea  is  that  an  infi¬ 
nite  sum  like  1  +  1  +  1  +  . . .  would 
clearly  give  an  infinite  result,  but 
this  is  only  because  the  terms  are 
all  staying  the  same  size.  If  the 
terms  get  smaller  and  smaller,  and 
get  smaller  fast  enough,  then  the 
result  can  be  finite.  For  example, 
consider  the  geometric  series  in  the 
case  where  x  =  1/2,  for  which  we 
expect  the  result  1/(1  —  1/2)  =  2. 
We  have 


which  at  the  successive  steps  of  ad¬ 
dition  equals  1,  l|,  l|,  ly|, 

....  We’re  getting  closer  and  closer 
to  2,  cutting  the  distance  in  half 
at  each  step.  Clearly  we  can  get  as 
close  as  we  like  to  2,  if  we’re  willing 
to  add  enough  terms. 

Note  that  we  ended  up  wanting  to 
talk  about  the  partial  sums  of  the 
series.  This  is  the  right  way  to  get 
a  rigorous  definition  of  the  conver¬ 
gence  of  series  in  general.  In  the 
case  of  the  geometric  series,  for  ex¬ 
ample,  we  can  define  a  sequence  of 
the  partial  sums  1,  1+x,  1+x+x 2, 
. . .  We  can  then  define  convergence 
and  limits  of  series  in  terms  of  con¬ 
vergence  and  limits  of  the  partial 
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sums. 

It’s  instructive  to  see  what  hap¬ 
pens  to  the  geometric  series  with 
x  =  0.1.  The  geometric  series  be¬ 
comes 

1  +  0.1  +  0.01  +  0.001  +  .... 

The  partial  sums  are  1,  1.1,  1.11, 
1.111,  ...We  can  see  vividly  here 
that  adding  another  term  will  only 
affect  the  result  in  a  certain  deci¬ 
mal  place,  without  affecting  any  of 
the  earlier  ones.  For  instance,  if 
we  needed  a  result  that  was  valid 
to  three  digits  past  the  decimal 
place,  we  could  stop  at  1.111,  be¬ 
ing  assured  that  we  had  attained  a 
good  enough  approximation.  If  we 
wanted  an  exact  result,  we  could 
also  observe  that  multiplying  the 
result  by  9  would  give  9.999..., 
which  is  the  same  as  10,  so  the 
result  must  be  10/9,  which  is  in 
agreement  with  1/(1  —  1/10)  = 
10/9. 

One  thing  to  watch  out  for  with 
infinite  series  is  that  the  axioms  of 
the  real  number  system  only  talk 
about  finite  sums,  so  it’s  easy  to 
get  wrong  results  by  attempting 
to  apply  them  to  infinite  ones  (see 
problem  2  on  page  114). 

7.3  Tests  for 
convergence 

There  are  many  different  tests  that 
can  be  used  to  determine  whether 
a  sequence  or  series  converges.  I’ll 
briefly  state  three  of  the  most  use¬ 
ful,  with  sketches  of  their  proofs. 


Bounded  and  increasing  sequences : 
A  sequence  that  always  increases, 
but  never  surpasses  a  certain  value, 
converges. 

This  amounts  to  a  restatement  of 
the  completeness  axiom  for  the  real 
numbers  stated  on  page  157,  and 
is  therefore  to  be  interpreted  not 
so  much  as  a  statement  about  se¬ 
quences  but  as  one  about  the  real 
number  system.  In  particular,  it 
fails  if  interpreted  as  a  statement 
about  sequences  confined  entirely 
to  the  rational  number  system, 
as  we  can  see  from  the  sequence 
1,  1.4,  1.41,  1.414,  ...consisting 
of  the  successive  decimal  approx¬ 
imations  to  y/2,  which  does  not 
converge  to  any  rational-number 
value. 

Example  79 

o  Prove  that  the  geometric  series  1  + 
1  /2  +  1  /4  +  . . .  converges. 

>  The  sequence  of  partial  sums  is  in¬ 
creasing,  since  each  term  is  positive. 
Each  term  closes  half  of  the  remain¬ 
ing  gap  separating  the  previous  par¬ 
tial  sum  from  2,  so  the  sum  never  sur¬ 
passes  2.  Since  the  partial  sums  are 
increasing  and  bounded,  they  con¬ 
verge  to  a  limit. 

Once  we  know  that  a  particular  se¬ 
ries  converges,  we  can  also  easily 
infer  the  convergence  of  other  se¬ 
ries  whose  terms  get  smaller  faster. 
For  example,  we  can  be  certain 
that  if  the  geometric  series  con¬ 
verges,  so  does  the  series 

1  ,  1  ,  1 

1  +  Ix2  +  Ix2x3  +  '"’ 
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whose  terms  get  smaller  faster 
than  any  base  raised  to  the  power 
n. 

Alternating  series  with  terms  ap¬ 
proaching  zero:  If  the  terms  of 
a  series  alternate  in  sign  and  ap¬ 
proach  zero,  then  the  series  con¬ 
verges. 

Sketch  of  a  proof:  The  even  par¬ 
tial  sums  form  an  increasing  se¬ 
quence,  the  odd  sums  a  decreas¬ 
ing  one.  Neither  of  these  sequences 
of  partial  sums  can  be  unbounded, 
since  the  difference  between  partial 
sums  n  and  n+  1  would  then  have 
to  be  unbounded,  but  this  differ¬ 
ence  is  simply  the  nth  term,  and 
the  terms  approach  zero.  Since 
the  even  partial  sums  are  increas¬ 
ing  and  bounded,  they  converge 
to  a  limit,  and  similarly  for  the 
odd  ones.  The  two  limits  must 
be  equal,  since  the  terms  approach 
zero. 

Example  80 

>  Prove  that  the  series  1-1/2+1/3- 
1  /4  +  . . .  converges. 

>  Its  convergence  follows  because  it  is 
an  alternating  series  with  decreasing 
terms.  The  sum  turns  out  to  be  In  2, 
although  the  convergence  of  the  se¬ 
ries  is  so  slow  that  an  extremely  large 
number  of  terms  is  required  in  order  to 
obtain  a  decent  approximation, 

The  integral  test:  If  the  terms  of  a 
series  an  are  positive  and  decreas¬ 
ing,  and  f(x)  is  a  positive  and  de¬ 
creasing  function  on  the  real  num¬ 
ber  line  such  that  f(n)  =  an,  then 
the  sum  of  an  from  n  =  1  to  oo 


converges  if  and  only  if  J /( x)  dx 
does. 

Sketch  of  proof:  Since  the  theo¬ 
rem  is  supposed  to  hold  for  both 
convergence  and  divergence,  and 
is  also  an  “if  and  only  if,”  there 
are  actually  four  cases  to  prove,  of 
which  we  pick  the  representative 
one  where  the  integral  is  known  to 
converge  and  we  want  to  prove  con¬ 
vergence  of  the  corresponding  sum. 
The  sum  and  the  integral  can  be 
interpreted  as  the  areas  under  two 
graphs:  one  like  a  smooth  ramp 
and  one  like  a  staircase.  Sliding  the 
staircase  half  a  unit  to  the  left,  it 
lies  entirely  underneath  the  ramp, 
and  therefore  the  area  under  it  is 
also  finite. 

Example  81 

o  Prove  that  the  series  1+1  /2+1  /3+. . . 
diverges. 

t>  The  integral  of  1  fx  is  Inx,  which  di¬ 
verges  as  x  approaches  infinity,  so  the 
series  diverges  as  well. 

The  ratio  test:  If  the  limit  R  = 
limn_^00  \an+i/an\  exists,  then  the 
sum  of  an  converges  if  R  <  1  and 
diverges  if  R  >  1. 

The  proof  can  be  obtained  by  com¬ 
paring  with  a  geometric  series. 

Example  82 

t>  Prove  that  the  series  1  + 1  /22  + 1  /33  + 
. . .  converges. 

t>  R  is  easily  proved  to  be  0,  so  the 
sum  converges  by  the  ratio  test. 

At  this  point  it  will  seem  like  a 
mystery  how  anyone  could  have 
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proved  the  exact  results  claimed 
for  some  of  the  “special”  series, 
such  as  1  —  1/2  +  1/3  —  1/4  + 
. . .  =  In  2.  Problems  like  these  are 
not  the  main  focus  of  the  chap¬ 
ter,  and  in  fact  there  is  no  well- 
defined  toolbox  of  techniques  that 
will  allow  any  such  “nice”  series  to 
be  evaluated  exactly.  Even  a  rel¬ 
atively  innocent-looking  example 
like  l-2  +  2-2  +  3-2  +  . . .  defeated 
some  of  the  best  mathematicians  of 
Europe  for  years  (see  problem  16, 
p.  116).  It  is  currently  unknown 
whether  some  apparently  simple 
series  such  as  X^^LiV(n3  sin2  n) 
converge.1 

7.4  Taylor  series 

If  you  calculate  e01  on  your  calcu¬ 
lator,  you’ll  find  that  it’s  very  close 
to  1.1.  This  is  because  the  tangent 
line  at  x  =  0  on  the  graph  of  ex 
has  a  slope  of  1  (de*/  dx  =  ex  =  1 
at  x  =  0),  and  the  tangent  line  is 
a  good  approximation  to  the  expo¬ 
nential  curve  as  long  as  we  don’t 
get  too  far  away  from  the  point  of 
tangency. 


How  big  is  the  error?  The 
actual  value  of  e01  is 
1.10517091807565 . . .,  which 
differs  from  1.1  by  about  0.005. 
If  we  go  farther  from  the  point 
of  tangency,  the  approximation 
gets  worse.  At  x  =  0.2,  the  error 

1Alekseyev,  “On  convergence  of  the 
Flint  Hills  series,”  arxiv .  org/abs/1104 . 
5100vl 


a  /  The  function  ex,  and 
the  tangent  line  at  x  =  0. 

is  about  0.021,  which  is  about 
four  times  bigger.  In  other  words, 
doubling  x  seems  to  roughly 
quadruple  the  error,  so  the  error 
is  proportional  to  x2\  it  seems  to 
be  about  x2 /2.  Well,  if  we  want 
a  handy-dandy,  super-accurate 
estimate  of  ex  for  small  values  of 
x ,  why  not  just  account  for  this 
error.  Our  new  and  improved 
estimate  is 

ex  ttl  +  x+^x2 
for  small  values  of  x. 


b  /  The  function  ex ,  and 
the  approximation  1  +  x  + 
x2/2. 
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Figure  b  shows  that  the  approxi¬ 
mation  is  now  extremely  good  for 
sufficiently  small  values  of  x.  The 
difference  is  that  whereas  1  +  x 
matched  both  the  y-intercept  and 
the  slope  of  the  curve,  1  +  x  +  x2 /2 
matches  the  curvature  as  well.  Re¬ 
call  that  the  second  derivative  is  a 
measure  of  curvature.  The  second 
derivatives  of  the  function  and  its 
approximation  are 


_d 

dx 


d  ,  . 

dr6  =  1 

1  +  x  +  \x2 


=  1 


We  can  do  even  better.  Suppose 


c  /  The  function  ex,  and 
the  approximation  1  +  x  + 
x2/2  +  x3/6. 


order  term  to  be  (1/2) (1/3): 

^„1  +  x+^  +  ^ 

Figure  c  shows  the  result.  For  a 
significant  range  of  x  values  close 
to  zero,  the  approximation  is  now 
so  good  that  we  can’t  even  see  the 
difference  between  the  two  func¬ 
tions  on  the  graph. 

On  the  other  hand,  figure  d  shows 
that  the  cubic  approximation  for 
somewhat  larger  negative  and  pos¬ 
itive  values  of  x  is  poor  —  worse, 
in  fact,  than  the  linear  approxi¬ 
mation,  or  even  the  constant  ap¬ 
proximation  ex  =  1.  This  is  to 
be  expected,  because  any  polyno¬ 
mial  will  blow  up  to  either  posi¬ 
tive  or  negative  infinity  as  x  ap¬ 
proaches  negative  infinity,  whereas 
the  function  ex  is  supposed  to  get 
very  close  to  zero  for  large  negative 
x.  The  idea  here  is  that  derivatives 
are  local  things:  they  only  measure 
the  properties  of  a  function  very 
close  to  the  point  at  which  they’re 
evaluated,  and  they  don’t  necessar¬ 
ily  tell  us  anything  about  points  far 
away. 


we  want  to  match  the  third  deriva¬ 
tives.  All  the  derivatives  of  ex, 
evaluated  at  x  =  0,  are  1,  so  we 
just  need  to  add  on  a  term  pro¬ 
portional  to  x3  whose  third  deriva¬ 
tive  is  one.  Taking  the  first  deriva¬ 
tive  will  bring  down  a  factor  of  3 
in  front,  and  taking  and  the  sec¬ 
ond  derivative  will  give  a  2,  so  to 
cancel  these  out  we  need  the  third- 


It’s  a  remarkable  fact,  then,  that 
by  taking  enough  terms  in  a  poly¬ 
nomial  approximation,  we  can  al¬ 
ways  get  as  good  an  approximation 
to  ex  as  necessary  —  it’s  just  that 
a  large  number  of  terms  may  be 
required  for  large  values  of  x.  In 
other  words,  the  infinite  series 

,  1  2  1  3 

1  +  X+2X  +~3X  +  - 
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d  /  The  function  ex,  and  the  approxi¬ 
mation  1  +  x  +  x2/2  +  x3/6,  on  a  wider 
scale. 


The  notation  for  a  product  like  1  • 
2  •  . . .  •  n  is  n!,  read  “n  factorial.” 
So  to  get  a  term  for  our  polynomial 
whose  fifth  derivative  is  1,  we  need 
x5/5!.  The  result  for  the  infinite 
series  is 


e  = 


00  Tn 

EX 

n\ 

n— 0 


where  the  special  case  of  0!  =  1 
is  assumed.2  This  infinite  series 
is  called  the  Taylor  series  for  ex, 
evaluated  around  x  =  0.  and  it’s 
true,  although  I  haven’t  proved  it, 
that  this  particular  Taylor  series 
always  converges  to  ex,  no  matter 
how  far  x  is  from  zero. 


always  gives  exactly  ex .  But  what 
is  the  pattern  here  that  would  al¬ 
lows  us  to  figure  out,  say,  the 
fourth-order  and  fifth-order  terms 
that  were  swept  under  the  rug 
with  the  symbol  “...”?  Let’s  do 
the  fifth-order  term  as  an  example. 
The  point  of  adding  in  a  fifth-order 
term  is  to  make  the  fifth  derivative 
of  the  approximation  equal  to  the 
fifth  derivative  of  ex,  which  is  1. 
The  first,  second,  . . .  derivatives  of 
x5  are 


— x5  =  5x4 
da; 


dx2 

_d^ 

dx3 

d1 

dx4 

dx5 


x5  =  5  •  4x3 
x5  =  5  •  4  •  3x2 
x5  =  5  •  4  •  3  •  2x 
x5  =  5  •  4  •  3  •  2  •  1 


In  general,  the  Taylor  series 
around  x  =  0  for  a  function  y  is 

OO 

T0(x)  =  ^2  anxn , 

n—0 

where  the  condition  for  equality  of 
the  nth  order  derivative  is 

_  1  cf  y 

n !  dx"  x=0 ' 

Here  the  notation  |  0  means  that 

the  derivative  is  to  be  evaluated  at 
x  =  0. 

A  Taylor  series  can  be  used  to  ap¬ 
proximate  other  functions  besides 
ex ,  and  when  you  ask  your  calcula¬ 
tor  to  evaluate  a  function  such  as  a 
sine  or  a  cosine,  it  may  actually  be 
using  a  Taylor  series  to  do  it.  Tay¬ 
lor  series  are  also  the  method  Inf 

2  This  makes  sense,  because,  for  exam¬ 
ple,  4!=5!/5,  3!=4!/4,  etc.,  so  we  should 
have  01=11/1. 
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uses  to  calculate  most  expressions 
involving  infinitesimals.  In  exam¬ 
ple  13  on  page  29,  we  saw  that 
when  Inf  was  asked  to  calculate 
1/(1  —  d),  where  d  was  infinitesi¬ 
mal,  the  result  was  the  geometric 
series: 

:  1/ ( 1— d) 

1 +d+d ~2+d ~3+d ~ ^ 

These  are  also  the  the  first  five 
terms  of  the  Taylor  series  for  the 
function  y  =  1/(1  —  x),  evaluated 
around  x  =  0.  That  is,  the  geo¬ 
metric  series  1  +  x  +  x2  +  x3  +  . . .  is 
really  just  one  special  example  of 
a  Taylor  series,  as  demonstrated  in 
the  following  example. 

Example  83 

>  Find  the  Taylor  series  of  y  =  1/(1  — 
x)  around  x  =  0. 

t>  Rewriting  the  function  as  y  =  (1  — 
x)”1  and  applying  the  chain  rule,  we 
have 


dy 

dx 


x=0 


dy 

dx2 


x=0 


dy 

dx3 


x=0 


y\x=o  - 1 
(i-xH  =i 

lx=0 

2(1  —  x)~3|  =2 

lx=0 

2  ■  3(1  —  x)-4  =2-3 


The  pattern  is  that  the  nth  derivative 
is  n\.  The  Taylor  series  therefore  has 
an  =  n\/n\  =  1 : 

1  o  q 

- -  =  1  +  X  +  X  +X+... 

1  -  X 


If  you  flip  back  to  page  106  and 
compare  the  rate  of  convergence  of 
the  geometric  series  for  x  =  0.1 
and  0.5,  you’ll  see  that  the  sum 
converged  much  more  quickly  for 
x  =  0.1  than  for  x  =  0.5.  In 
general,  we  expect  that  any  Taylor 
series  will  converge  more  quickly 
when  x  is  smaller.  Now  consider 
what  happens  at  x  =  1.  The  series 
is  now  1  +  1  +  1  +  . . . ,  which  gives 
an  infinite  result,  and  we  shouldn’t 
have  expected  any  better  behav¬ 
ior,  since  attempting  to  evaluate 
1/(1  —  x)  at  x  =  1  gives  divi¬ 
sion  by  zero.  For  x  >  1,  the  re¬ 
sults  become  nonsense.  For  exam¬ 
ple,  1/(1  —  2)  =  —  1,  which  is  fi¬ 
nite,  but  the  geometric  series  gives 
1  +  2  +  4  +  . . .,  which  is  infinite. 

In  general,  every  function’s  Taylor 
series  around  x  =  0  converges  for 
all  values  of  x  in  the  range  defined 
by  |  a:  |  <  r,  where  r  is  some  num¬ 
ber,  known  as  the  radius  of  con¬ 
vergence.  Also,  if  the  function  is 
defined  by  putting  together  other 
functions  that  are  well  behaved  (in 
the  sense  of  converging  to  their 
own  Taylor  series  in  the  relevant 
region),  then  the  Taylor  series  will 
not  only  converge  but  converge  to 
the  correct  value.  For  the  function 
ex,  the  radius  happen  to  be  infi¬ 
nite,  whereas  for  1/(1  —  a?)  it  equals 
1.  The  following  example  shows  a 
worst-case  scenario. 


Example  84 

The  function  y  =  e~1/x  ,  shown  in  fig- 
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-4-2  2  4 

e/The  function  e~1/><2  never  con¬ 
verges  to  its  Taylor  series. 


ure  e,  never  converges  to  its  Taylor  se¬ 
ries,  except  at  x  =  0.  This  is  because 
the  Taylor  series  for  this  function,  eval¬ 
uated  around  x  =  0  is  exactly  zero!  At 
x  =  0,  we  have  y  =  0,  dy/dx  =  0, 
cPy/dx2  =  0,  and  so  on  for  every 
derivative.  The  zero  function  matches 
the  function  y(x)  and  all  its  derivatives 
to  all  orders,  and  yet  is  useless  as 
an  approximation  to  y(x).  The  radius 
of  convergence  of  the  Taylor  series  is 
infinite,  but  it  doesn't  give  correct  re¬ 
sults  except  at  x  =  0.  The  reason 
for  this  is  that  y  was  built  by  compos¬ 
ing  two  functions,  w(x)  =  —  1/x2  and 
y{w)  =  ew.  The  function  w  is  badly 
behaved  at  x  =  0  because  it  blows  up 
there.  In  particular,  it  doesn't  have  a 
well-defined  Taylor  series  at  x  =  0. 


Example  85 

>  Find  the  Taylor  series  of  y  =  sinx, 
evaluated  around  x  =  0. 


t>  The  first  few  derivatives  are 


—  sinx  =  cosx 
dx 


rf 


dx2 

cf 

dx3 


sinx  =  —  sinx 

sinx  =  —  cosx 


r  sinx  =  sinx 
dx4 


—g  sinx  =  cosx 
dx5 


We  can  see  that  there  will  be  a  cy¬ 
cle  of  sin,  cos,  -  sin,  and  -  cos,  re¬ 
peating  indefinitely.  Evaluating  these 
derivatives  at  x  =  0,  we  have  0,1,0, 

—  1, _  All  the  even-order  terms  of 

the  series  are  zero,  and  all  the  odd- 
order  terms  are  ±1  fn\.  The  result  is 


1  3  1  5 

sin  x  =  x  —  —x  +  —x  — _ 


The  linear  term  is  the  familiar  small- 
angle  approximation  sin  xr;x. 

The  radius  of  convergence  of  this  se¬ 
ries  turns  out  to  be  infinite.  Intuitively 
the  reason  for  this  is  that  the  factori¬ 
als  grow  extremely  rapidly,  so  that  the 
successive  terms  in  the  series  even¬ 
tually  start  diminish  quickly,  even  for 
large  values  of  x. 

Example  86 

Suppose  that  we  want  to  evaluate  a 
limit  of  the  form 


lim 


u(x) 

v(x)’ 


where  u( 0)  =  v(0)  =  0.  L'Hopital's  rule 
tells  us  that  we  can  do  this  by  taking 
derivatives  on  the  top  and  bottom  to 
form  u'/v\  and  that,  if  necessary,  we 
can  do  more  than  one  derivative,  e.g., 


7.4.  TAYLOR  SERIES 


113 


u" / v".  This  was  proved  on  p.  152  us¬ 
ing  the  mean  value  theorem.  But  if  u 
and  v  are  both  functions  that  converge 
to  their  Taylor  series,  then  it  is  much 
easier  to  see  why  this  works.  For  ex¬ 
ample,  suppose  that  their  Taylor  se¬ 
ries  both  have  vanishing  constant  and 
linear  terms,  so  that  u  =  ax 2  +  . . .  and 
v  =  bx2  +  . . ..  Then  u"  =  2 a  +  . . .,  and 
v"  =  2b  +  . . .. 

A  function’s  Taylor  series  doesn’t 
have  to  be  evaluated  around  x  = 
0.  The  Taylor  series  around  some 
other  center  x  =  c  is  given  by 

OO 

Tc(x)  =  ^2  an(x  -  c)n, 

n— 0 

where 

On  _ 

n\  dxn  x=c‘ 

To  see  that  this  is  the  right  gen¬ 
eralization,  we  can  do  a  change  of 
variable,  defining  a  new  function 
g{x)  =  f(x  —  c).  The  radius  of  con¬ 
vergence  is  to  be  measured  from 
the  center  c  rather  than  from  0. 

Example  87 

>  Find  the  Taylor  series  of  Inx,  evalu¬ 
ated  around  x  =  1 . 


Note  that  evaluating  these  at  x  =  0 
wouldn’t  have  worked,  since  division 
by  zero  is  undefined;  this  is  because 
Inx  blows  up  to  negative  infinity  at 
x  =  0.  Evaluating  them  at  x  =  1, 
we  find  that  the  nth  derivative  equals 
±(n  —  1)!,  so  the  coefficients  of  the 
Taylor  series  are  ±(n—  1)!/n!  =  ±1/n, 
except  for  the  n  =  0  term,  which  is 
zero  because  In  1  =0.  The  resulting 
series  is 

Inx  =  (x-1)-l(x-1)2+l(x-1)3+. . . . 

We  can  predict  that  its  radius  of  con¬ 
vergence  can't  be  any  greater  than  1 , 
because  In  x  blows  up  at  0,  which  is  at 
a  distance  of  1  from  1 . 
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Problems 

1  Modify  the  Weierstrass  defini¬ 
tion  of  the  limit  to  apply  to  infinite 
sequences.  >  Solution,  p.  192 

2  (a)  Prove  that  the  infinite  se¬ 
ries  1  —  1  +  1  —  1  +  1  —  1  +  ... 
does  not  converge  to  any  limit,  us¬ 
ing  the  generalization  of  the  Weier¬ 
strass  limit  found  in  problem  1. 
(b)  Criticize  the  following  argu¬ 
ment.  The  series  given  in  part  a 
equals  zero,  because  addition  is  as¬ 
sociative,  so  we  can  rewrite  it  as 
(1-1)  +  (1-1)  +  (1-1)  +  ... 

I>  Solution,  p.  192 

3  Use  the  integral  test  to  prove 
the  convergence  of  the  geometric 
series  for  0  <  x  <  1. 

>  Solution,  p.  192 

4  Determine  the  convergence  or 
divergence  of  the  following  series. 

(a)  1  +  1/22  +  1/32  +  . . . 

(b)  1/ lnln3— 1/ lnln  6+1/ lnln  9— 
1  /  In  In  12  +  .  . . 

(c) 


1  |  1 
ln~2  +  (In  2)  (In  3) 

1 

(In  2)  (In  3)  (In  4) 

(d) 


2^2  ^ 
9801  ^ 


(4fc)!(1103  + 26390/c) 
(fc!)4  3964fc 


t>  Solution,  p.  192 

5  Give  an  example  of  a  series  for 
which  the  ratio  test  is  inconclusive. 

[>  Solution,  p.  193 


6  Find  the  Taylor  series  expan¬ 
sion  of  cos  a;  around  x  =  0.  Check 
your  work  by  combining  the  first 
two  terms  of  this  series  with  the 
first  term  of  the  sine  function  from 
example  85  on  page  112  to  ver¬ 
ify  that  the  trig  identity  sin2  x  + 
cos2  x  =  1  holds  for  terms  up  to 
order  x2. 

7  In  classical  physics,  the  kinetic 
energy  K  of  an  object  of  mass  m 
moving  at  velocity  v  is  given  by 
K  =  fmr2.  For  example,  if  a  car  is 
to  start  from  a  stoplight  and  then 
accelerate  up  to  v,  this  is  the  the¬ 
oretical  minimum  amount  of  en¬ 
ergy  that  would  have  to  be  used 
up  by  burning  gasoline.  (In  real¬ 
ity,  a  car’s  engine  is  not  100%  effi¬ 
cient,  so  the  amount  of  gas  burned 
is  greater.) 

Einstein’s  theory  of  relativity 
states  that  the  correct  equation  is 
actually 


K  = 


me2, 


where  c  is  the  speed  of  light.  The 
fact  that  it  diverges  as  v  — >  c  is 
interpreted  to  mean  that  no  object 
can  be  accelerated  to  the  speed  of 
light. 

Expand  K  in  a  Taylor  series,  and 
show  that  the  first  nonvanishing 
term  is  equal  to  the  classical  ex¬ 
pression.  This  means  that  for  ve¬ 
locities  that  are  small  compared  to 
the  speed  of  light,  the  classical  ex¬ 
pression  is  a  good  approximation, 
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and  Einstein’s  theory  does  not  con¬ 
tradict  any  of  the  prior  empirical 
evidence  from  which  the  classical 
expression  was  inferred. 

8  Expand  (1  +  a;)1/3  in  a  Taylor 
series  around  x  =  0.  The  value 
x  =  28  lies  outside  this  series’  ra¬ 
dius  of  convergence,  but  we  can 
nevertheless  use  it  to  extract  the 
cube  root  of  28  by  recognizing  that 
281/3  =  3(28/27)1/3.  Calculate  the 
root  to  four  significant  figures  of 
precision,  and  check  it  in  the  ob¬ 
vious  way. 

9  Find  the  Taylor  series  expan¬ 
sion  of  log2  x  around  x  =  1,  and 
use  it  to  evaluate  log2  1.0595  to 
four  significant  figures  of  precision. 
Check  your  result  by  using  the  fact 
that  1.0595  is  approximately  the 
twelfth  root  of  2.  This  number  is 
the  ratio  of  the  frequencies  of  two 
successive  notes  of  the  chromatic 
scale  in  music,  e.g.,  C  and  D-flat. 


10  In  free  fall,  the  acceleration 
will  not  be  exactly  constant,  due 
to  air  resistance.  For  example,  a 
skydiver  does  not  speed  up  indefi¬ 
nitely  until  opening  her  chute,  but 
rather  approaches  a  certain  maxi¬ 
mum  velocity  at  which  the  upward 
force  of  air  resistance  cancels  out 
the  force  of  gravity.  If  an  object  is 
dropped  from  a  height  h,  and  the 
time  it  takes  to  reach  the  ground  is 
used  to  measure  the  acceleration  of 
gravity,  g ,  then  the  relative  error  in 


where  b  =  h/A ,  and  A  is  a  constant 
that  depends  on  the  size,  shape, 
and  mass  of  the  object,  and  the 
density  of  the  air.  (For  a  sphere  of 
mass  m  and  diameter  d  dropping 
in  air,  A  =  4Alm/d2.  Cf.  problem 
20,  p.  49.)  Evaluate  the  constant 
and  linear  terms  of  the  Taylor  se¬ 
ries  for  the  function  E(b). 

11  (a)  Prove  that  the  conver¬ 

gence  of  an  infinite  series  is  un¬ 
affected  by  omitting  some  initial 
terms,  (b)  Similarly,  prove  that 
convergence  is  unaffected  by  mul¬ 
tiplying  all  the  terms  by  some  con¬ 
stant  factor. 


12  The  identity 


is  known  as  the  “Sophomore’s 
dream,”  because  at  first  glance  it 
looks  like  the  kind  of  plausible 
but  false  statement  that  someone 
would  naively  dream  up.  Verify  it 
numerically  by  machine  computa¬ 
tion. 


13  Does  sin  a;  +  sin  sin  .t  + 
sin  sin  sin  x  +  . . .  converge? 

[>  Solution,  p.  194  ★ 

3  Jan  Benacka  and  Igor  Stubna,  The 
Physics  Teacher,  43  (2005)  432. 
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14  Evaluate 


1  + 


1 


+ 


1 


1+2  1+2+3 


>  Solution,  p.  193  ★ 


15  Evaluate 


~  ( — i)n 

^  n  +  1  +  1/n! 

n—0  ' 


to  six  decimal  places.  * 

16  Euler  was  the  first  to  prove 


111  _  7T2 

12  +  22  +P  +  "‘_  T  ' 

This  problem  had  defeated  other 
great  mathematicians  of  his  time, 
and  was  famous  enough  to  be  given 
a  special  name,  the  Basel  prob¬ 
lem.  Here  we  present  an  argument 
based  closely  on  Euler’s  and  pose 
the  problem  of  how  to  exploit  Eu¬ 
ler’s  technique  further  in  order  to 
prove 

111  _  7T4 

Ii+2i  +  3i  +  "'~90' 


From  the  Taylor  series  for  the  sine 
function,  we  find  the  related  series 


/(+) 


sin  yfx  i  x 
\fx  3! 


x2 

5T 


The  partial  sums  of  this  series  are 
polynomials  that  approximate  / 
for  small  values  of  a:.  If  such  a 
polynomial  were  exact  rather  than 
approximate,  then  it  would  have 
zeroes  at  x  =  n2,  H r2,  97t2,  . . . , 
and  we  could  write  it  as  the  prod¬ 
uct  of  its  linear  factors.  Euler  as¬ 
sumed,  without  any  more  rigorous 


proof,  that  this  factorization  pro¬ 
cedure  could  be  extended  to  the 
infinite  series,  so  that  f  could  be 
represented  as  the  infinite  product 

By  multiplying  this  out  and  equat¬ 
ing  its  linear  term  to  that  of  the 
Taylor  series,  we  find  the  claimed 
result. 

Extend  this  procedure  to  the  x2 
term  and  prove  the  result  claimed 
for  the  sum  of  the  inverse  fourth 
powers  of  the  integers.  (The 
sums  with  odd  exponents  >  3  are 
much  harder,  and  relatively  little 
is  known  about  them.  The  sum 
of  the  inverse  cubes  is  known  as 
Apery’s  constant.)  * 

17  Does 


sin  (a;2)  da; 

converge,  or  not? 

>  Solution,  p.  193  ~k 

18  Evaluate 

lim  cos(7r\/ n2  —  n ), 

n—too 

where  n  is  an  integer.  * 

19  Determine  the  convergence 
of  the  series 


OO 

I +2-". 

n— 0 

and  if  it  converges,  evaluate  it. 

>  Solution,  p.  194  4- 
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20  Determine  the  convergence 
of  the  series 


E^2"”’ 

n— 0 

and  if  it  converges,  evaluate  it. 

>  Solution,  p.  194  ★ 

21  For  what  integer  values  of  p 
should  we  expect  the  series 


E 


I  COS?l|" 

np 


to  converge?  A  rigorous  proof  is 
very  difficult  and  may  even  be  an 
open  problem,  but  it  is  relatively 
straightforward  to  give  a  convinc¬ 
ing  argument. 

>  Solution,  p.  195  ★ 


118 


CHAPTER  7.  SEQUENCES  AND  SERIES 


8  Complex  number 
techniques 


8.1  Review  of 
complex 
numbers 

For  a  more  detailed  treatment  of 
complex  numbers,  see  ch.  3  of 
James  Nearing’s  free  book  at 

http : //www. physics .miami . edu/ 
nearing/mathmethods/. 


a  /  Visualizing  complex  numbers  as 
points  in  a  plane. 

We  assume  there  is  a  number,  i, 
such  that  i2  =  —1.  The  square 
roots  of  —1  are  then  i  and  —i.  (In 
electrical  engineering  work,  where 
i  stands  for  current,  j  is  sometimes 
used  instead.)  This  gives  rise  to 
a  number  system,  called  the  com¬ 
plex  numbers,  containing  the  real 


b  /  Addition  of  complex  numbers  is 
just  like  addition  of  vectors,  although 
the  real  and  imaginary  axes  don’t  ac¬ 
tually  represent  directions  in  space. 

numbers  as  a  subset.  Any  com¬ 
plex  number  z  can  be  written  in 
the  form  z  =  a  +  bi,  where  a  and 
b  are  real,  and  a  and  b  are  then 
referred  to  as  the  real  and  imagi¬ 
nary  parts  of  z.  A  number  with 
a  zero  real  part  is  called  an  imag¬ 
inary  number.  The  complex  num¬ 
bers  can  be  visualized  as  a  plane, 
figure  a,  with  the  real  number  line 
placed  horizontally  like  the  x  axis 
of  the  familiar  x  —  y  plane,  and  the 
imaginary  numbers  running  along 
the  y  axis.  The  complex  num¬ 
bers  are  complete  in  a  way  that  the 
real  numbers  aren’t:  every  nonzero 
complex  number  has  two  square 
roots.  For  example,  1  is  a  real 
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2  +  i 


2-i 


c  /  A  complex  number  and  its  conju¬ 
gate. 

number,  so  it  is  also  a  member 
of  the  complex  numbers,  and  its 
square  roots  are  —1  and  1.  Like¬ 
wise,  —  1  has  square  roots  i  and  —i, 
and  the  number  i  has  square  roots 
1  /V2  +  i/V2  and  -l/y/2  -i/y/2. 

Complex  numbers  can  be  added 
and  subtracted  by  adding  or  sub¬ 
tracting  their  real  and  imaginary 
parts,  figure  b.  Geometrically,  this 
is  the  same  as  vector  addition. 

The  complex  numbers  a  +  bi  and 
a  —  bi,  lying  at  equal  distances 
above  and  below  the  real  axis,  are 
called  complex  conjugates.  The  re¬ 
sults  of  the  quadratic  formula  are 
either  both  real,  or  complex  conju¬ 
gates  of  each  other.  The  complex 
conjugate  of  a  number  z  is  notated 
as  z  or  z*. 

The  complex  numbers  obey  all  the 
same  rules  of  arithmetic  as  the  re¬ 
als,  except  that  they  can’t  be  or¬ 
dered  along  a  single  line.  That  is, 


it’s  not  possible  to  say  whether  one 
complex  number  is  greater  than 
another.  We  can  compare  them 
in  terms  of  their  magnitudes  (their 
distances  from  the  origin),  but 
two  distinct  complex  numbers  may 
have  the  same  magnitude,  so,  for 
example,  we  can’t  say  whether  1  is 
greater  than  i  or  i  is  greater  than 
1. 

Example  88 

o  Prove  that  1  /y/2  +  i/V 2  is  a  square 
root  of  /. 

t>  Our  proof  can  use  any  ordinary  rules 
of  arithmetic,  except  for  ordering. 

,J_  +  i  '2  1 _ 1_  +  J _ i_ 

v2  +  s/2  \/2  y/2  +  V2  \/2. 

i  1  i  i 

V2.  V2  V2.  V2 

=  1(1  +/  +  /-1) 

=  / 


Example  88  showed  one  method 
of  multiplying  complex  numbers. 
However,  there  is  another  nice  in¬ 
terpretation  of  complex  multiplica¬ 
tion.  We  define  the  argument  of 
a  complex  number,  figure  d,  as  its 
angle  in  the  complex  plane,  mea¬ 
sured  counterclockwise  from  the 
positive  real  axis.  Multiplying 
two  complex  numbers  then  corre¬ 
sponds  to  multiplying  their  magni¬ 
tudes,  and  adding  their  arguments, 
figure  e. 

Self -Check 

Using  this  interpretation  of  multiplica¬ 
tion,  how  could  you  find  the  square 
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d  /  A  complex  number  can  be  de¬ 
scribed  in  terms  of  its  magnitude  and 
argument. 

roots  of  a  complex  number?  > 

Answer,  p.  1 65 

Example  89 

The  magnitude  |z|  of  a  complex  num¬ 
ber  z  obeys  the  identity  |z|2  =  zz. 
To  prove  this,  we  first  note  that  z  has 
the  same  magnitude  as  z,  since  flip¬ 
ping  it  to  the  other  side  of  the  real  axis 
doesn’t  change  its  distance  from  the 
origin.  Multiplying  z  by  z  gives  a  re¬ 
sult  whose  magnitude  is  found  by  mul¬ 
tiplying  their  magnitudes,  so  the  mag¬ 
nitude  of  zz  must  therefore  equal  |z|2. 
Now  we  just  have  to  prove  that  zz  is  a 
positive  real  number.  But  if,  for  exam¬ 
ple,  z  lies  counterclockwise  from  the 
real  axis,  then  z  lies  clockwise  from 
it.  If  z  has  a  positive  argument,  then 
z  has  a  negative  one,  or  vice-versa. 
The  sum  of  their  arguments  is  there¬ 
fore  zero,  so  the  result  has  an  argu¬ 
ment  of  zero,  and  is  on  the  positive 
real  axis.  1 

1I  cheated  a  little.  If  Z’s  argument  is 


e  /  The  argument  of  uv  is  the  sum  of 
the  arguments  of  u  and  v. 

This  whole  system  was  built  up 
in  order  to  make  every  number 
have  square  roots.  What  about 
cube  roots,  fourth  roots,  and  so 
on?  Does  it  get  even  more  weird 
when  you  want  to  do  those  as  well? 
No.  The  complex  number  system 
we’ve  already  discussed  is  sufficient 
to  handle  all  of  them.  The  nicest 
way  of  thinking  about  it  is  in  terms 
of  roots  of  polynomials.  In  the 
real  number  system,  the  polyno¬ 
mial  x2  —  1  has  two  roots,  i.e.,  two 
values  of  x  (plus  and  minus  one) 
that  we  can  plug  in  to  the  polyno¬ 
mial  and  get  zero.  Because  it  has 
these  two  real  roots,  we  can  rewrite 
the  polynomial  as  (x  —  l)(x  +  1). 
However,  the  polynomial  x2  +  l  has 
no  real  roots.  It’s  ugly  that  in  the 
real  number  system,  some  second- 

30  degrees,  then  we  could  say  z’s  was  -30, 
but  we  could  also  call  it  330.  That’s  OK, 
because  330+30  gives  360,  and  an  argu¬ 
ment  of  360  is  the  same  as  an  argument 
of  zero. 
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order  polynomials  have  two  roots, 
and  can  be  factored,  while  others 
can’t.  In  the  complex  number  sys¬ 
tem,  they  all  can.  For  instance, 
x2  +  1  has  roots  i  and  —i,  and  can 
be  factored  as  (x  —  i)(x  +  i).  In 
general,  the  fundamental  theorem 
of  algebra  states  that  in  the  com¬ 
plex  number  system,  any  nth-order 
polynomial  can  be  factored  com¬ 
pletely  into  n  linear  factors,  and 
we  can  also  say  that  it  has  n  com¬ 
plex  roots,  with  the  understand¬ 
ing  that  some  of  the  roots  may  be 
the  same.  For  instance,  the  fourth- 
order  polynomial  x4  +  x2  can  be 
factored  as  (x  —  i)(x  +  i)(x  —  0)(x  — 
0),  and  we  say  that  it  has  four 
roots,  i,  —i,  0,  and  0,  two  of  which 
happen  to  be  the  same.  This  is  a 
sensible  way  to  think  about  it,  be¬ 
cause  in  real  life,  numbers  are  al¬ 
ways  approximations  anyway,  and 
if  we  make  tiny,  random  changes  to 
the  coefficients  of  this  polynomial, 
it  will  have  four  distinct  roots,  of 
which  two  just  happen  to  be  very 
close  to  zero.  I’ve  given  a  proof  of 
the  fundamental  theorem  of  alge¬ 
bra  on  page  162. 

8.2  Euler’s  formula 

Having  expanded  our  horizons  to 
include  the  complex  numbers,  it’s 
natural  to  want  to  extend  func¬ 
tions  we  knew  and  loved  from  the 
world  of  real  numbers  so  that  they 
can  also  operate  on  complex  num¬ 
bers.  The  only  really  natural  way 
to  do  this  in  general  is  to  use  Tay¬ 
lor  series.  A  particularly  beautiful 


thing  happens  with  the  functions 
ex,  sin  a:,  and  cosx: 


ex  =  l  +  ±x2  +  ±x3  +  ... 

1  1  2  ,  1  4 

COSCC  =  l—  —  X  +  —,x  —  ... 
2!  4! 

1  3  1  5 

sm.'c  =  a; - -x  + —.x  —  ... 

3!  5! 


If  x  =  i(f)  is  an  imaginary  number, 
we  have 


el<t>  =  cos  (j>  +  i  sin  (j>, 

a  result  known  as  Euler’s  formula. 
The  geometrical  interpretation  in 
the  complex  plane  is  shown  in  fig¬ 
ure  f. 


f  /  The  complex  number  e'4  lies  on  the 
unit  circle. 


Although  the  result  may  seem  like 
something  out  of  a  freak  show  at 
first,  applying  the  definition2  of  the 

2See  page  151  for  an  explanation  of 
where  this  definition  comes  from  and  why 
it  makes  sense. 
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exponential  function  makes  it  clear 
how  natural  it  is: 


Example  90 

>  Write  the  sine  and  cosine  functions 
in  terms  of  exponentials. 


o  Euler’s  formula  for  x  =  —k b  qives 

/  x\n 

px  =  lim  (l—i _ ]  coscb  - /sin  6,  since  cos(-0)  =  cos0, 

Vi+J  •  and  sin(-0)  = -Sine. 


When  x  =  i<j>  is  imaginary,  the 
quantity  (1  +  i<f>/n)  represents  a 
number  lying  just  above  1  in  the 
complex  plane.  For  large  n,  (1  + 
i(j>/n)  becomes  very  close  to  the 
unit  circle,  and  its  argument  is  the 
small  angle  <p/n.  Raising  this  num¬ 
ber  to  the  nth  power  multiplies  its 
argument  by  n,  giving  a  number 
with  an  argument  of  <fi. 


cosx  = 


e  +  e 


2 


sinx  = 


>  Evaluate 


Example  91 


/ 


cosxdx 


>  Problem  15  on  p.  99  suggested  a 
special-purpose  trick  for  doing  this  in¬ 
tegral.  An  approach  that  doesn’t  rely 
on  tricks  is  to  rewrite  the  cosine  in 
terms  of  exponentials: 


g  /  Leonhard  Euler 
(1707-1783) 


cos  x  dx 


1_ 

2 

2 


+  e 


(i-0* 


1  -  / 


dx 
)  dx 


Since  this  result  is  the  integral  of  a 
real-valued  function,  we’d  like  it  to  be 
real,  and  in  fact  it  is,  since  the  first  and 
second  terms  are  complex  conjugates 
of  one  another.  If  we  wanted  to,  we 
could  use  Euler’s  theorem  to  convert 
it  back  to  a  manifestly  real  result.3 


Euler’s  formula  is  used  frequently 
in  physics  and  engineering. 


3In  general,  the  use  of  complex  num- 
ber  techniques  to  do  an  integral  could  re¬ 
sult  in  a  complex  number,  but  that  com¬ 
plex  number  would  be  a  constant,  which 
could  be  subsumed  within  the  usual  con¬ 
stant  of  integration. 
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Example  92 

Euler  found  the  equation 

7t  =  20tan~1  1  +  8tan_1  7-, 

7  79 

which  allowed  the  computation  of  7tto 
high  precision  in  the  era  before  elec¬ 
tronic  calculators,  since  the  Taylor  se¬ 
ries  for  the  inverse  tangent  converges 
rapidly  for  small  inputs.  A  cute  way  of 
proving  the  validity  of  the  equation  is 
to  calculate 

(7  +  /)20(79  +  3/)8 

as  follows  in  Yacas: 

(7+1) “20* (79+3*1) “8 ; 

-14901 1611 9384765625 
00000000000000 

The  fact  that  it  is  purely  real,  and  has 
a  negative  real  part,  demonstrates 
that  the  quantity  on  the  right  side  of 
the  original  equation  equals  7t  +  2nn, 
where  n  is  an  integer.  Numerical  esti¬ 
mation  shows  that  n  =  0.  Although  the 
proof  was  straightforward,  it  provides 
zero  insight  into  how  Euler  figured  it 
out  in  the  first  place! 

8.3  Partial  fractions 
revisited 

Suppose  we  want  to  evaluate  the 
integral 

r  dx 
J  X2  + 1 

by  the  method  of  partial  fractions. 
The  quadratic  formula  tells  us  that 
the  roots  are  i  and  —i,  setting 
l/(x2  +  l)  =  A/(x  +  i)  +  B/(x  —  i) 


gives  A  =  i/ 2  and  B  =  —i/2,  so 
da:  i  f  dx 

a;2  +  1  2  J  x  +  i 

1  I  dx 

2  J  x  —  i 

=  -  In  (a:  +  i) 

-  \  lnO  -  i ) 

1  ,  x  +  i 

=  -  in - . 

2  x  —  1 

The  attractive  thing  about  this  ap¬ 
proach,  compared  with  the  method 
used  on  page  88,  is  that  it  doesn’t 
require  any  tricks.  If  you  came 
across  this  integral  ten  years  from 
now,  you  could  pull  out  your  old 
calculus  book,  flip  through  it,  and 
say,  “Oh,  here  we  go,  there’s  a  way 
to  integrate  one  over  a  polynomial 
partial  fractions.”  On  the  other 
hand,  it’s  odd  that  we  started  out 
trying  to  evaluate  an  integral  that 
had  nothing  but  real  numbers,  and 
came  out  with  an  answer  that  isn’t 
even  obviously  a  real  number. 

But  what  about  that  expression 
(x+i)/(x—i)7  Let’s  give  it  a  name, 
w.  The  numerator  and  denomina¬ 
tor  are  complex  conjugates  of  one 
another.  Since  they  have  the  same 
magnitude,  we  must  have  |io|  =  1, 
i.e. ,  w  is  a  complex  number  that 
lies  on  the  unit  circle,  the  kind  of 
complex  number  that  Euler’s  for¬ 
mula  refers  to.  The  numerator 
has  an  argument  of  tan_1(l/a;)  = 
7t/2  —  tan-1  a:,  and  the  denomi¬ 
nator  has  the  same  argument  but 
with  the  opposite  sign.  Division 
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means  subtracting  arguments,  so 
arg  w  =  tt  —  2  tan^1  x.  That  means 
that  the  result  can  be  rewritten  us¬ 
ing  Euler’s  formula  as 

[  -  -1  nJC-2ta.n~1  x) 

J  z2  +  l  _  2me 

X  .  ,  _ 1 

=  9  '  l(7r  ~  2  tan  x) 
=  tan-1  x  +  c. 


In  other  words,  it’s  the  same  result 
we  found  before,  but  found  with¬ 
out  the  need  for  trickery. 

Example  93 

>  Evaluate  /  c/x/sinx. 

>  This  can  be  tackled  by  rewriting  the 
sine  function  in  terms  of  complex  ex¬ 
ponentials,  changing  variables  to  u  = 
elx ,  and  then  using  partial  fractions. 


/ 


dx 

sinx 


=  -2/ 
=  -2/ 
=  -2 


dx 

eix  -  e~ix 

6u/iu 
u-  1/u 
du 


u2-  1 


/ 


du 
u-  1 


_d u_ 
u  +  1 


=  ln(u  -  1)  -  ln(u  +  1)  +  c 


=  ln(-/'tan(x/2))  +  c 
=  lntan(x/2)  +  d 
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Problems 


>  Solution,  p.  195 


1  Find  argi,  arg(— i),  and  arg37, 
where  arg  z  denotes  the  argument 
of  the  complex  number  2. 

2  Visualize  the  following  multi¬ 
plications  in  the  complex  plane 
using  the  interpretation  of  mul¬ 
tiplication  in  terms  of  multiply¬ 
ing  magnitudes  and  adding  argu¬ 
ments:  (i)(i)  =  —  1,  (i)(—i)  =  1, 

H)H)  =  -i- 

3  If  we  visualize  z  as  a  point  in 
the  complex  plane,  how  should  we 
visualize  — zl 


4  Find  four  different  complex 
numbers  2  such  that  z4  =  1. 

5  Compute  the  following: 


|1  +  <I 

1 

1  +  i 


arg(l  +  i) 


1 


1  +  i 


6  Write  the  function  tanx  in 
terms  of  complex  exponentials. 

7  Evaluate  /  sin3xdx. 

8  Use  Euler’s  theorem  to  derive 
the  addition  theorems  that  express 
sin(o  +  b)  and  cos(a  +  b)  in  terms 
of  the  sines  and  cosines  of  a  and  b. 

\>  Solution,  p.  196 

9  Evaluate 

/2 

cosx  cos2xdx. 


10  Find  every  complex  number 
z  such  that  z3  =  1. 

>  Solution,  p.  196 

1 1  Factor  the  expression  x3  —  y3 
into  factors  of  the  lowest  possible 
order,  using  complex  coefficients. 
(Hint:  use  the  result  of  problem 
10.)  Then  do  the  same  using  real 
coefficients.  >  Solution,  p.  196 

12  Evaluate 

r  dx 

J  x3  —  x2  +  4x  —  4 

13  Evaluate 

f  e~ax  cos  bx  dx. 


14  Consider  the  equation 

/'(x)  =  /(/(x)).  This  is  known 
as  a  differential  equation:  an  equa¬ 
tion  that  relates  a  function  to  its 
own  derivatives.  What  is  unusual 
about  this  differential  equation  is 
that  the  right-hand  side  involves 
the  function  nested  inside  itself. 
Given,  for  example,  the  value  of 
/( 0),  we  expect  the  solution  of 
this  equation  to  exist  and  to  be 
uniquely  defined  for  all  values  of  x. 
That  doesn’t  mean,  however,  that 
we  can  write  down  such  a  solution 
as  a  closed-form  expression.  Show 
that  two  closed-form  expressions 
do  exist,  of  the  form  /(x)  =  axb , 
and  find  the  two  values  of  b. 

>  Solution,  p.  196 
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15  (a)  Discuss  how  the  integral 

j '  dx 

J  xioooo  __  1 

could  be  evaluated,  in  principle,  in 
closed  form,  (b)  See  what  happens 
when  you  try  to  evaluate  it  using 
computer  software,  (c)  Express  it 
as  a  finite  sum. 

>  Solution,  p.  197  ★ 


CHAPTER  8.  COMPLEX  NUMBER  TECHNIQUES 


9  Iterated  integrals 


9.1  Integrals  inside 
integrals 

In  various  applications,  you  need 
to  do  integrals  stuck  inside  other 
integrals.  These  are  known  as  it¬ 
erated  integrals,  or  double  inte¬ 
grals,  triple  integrals,  etc.  Simi¬ 
lar  concepts  crop  up  all  the  time 
even  when  you’re  not  doing  cal¬ 
culus,  so  let’s  start  by  imagining 
such  an  example.  Suppose  you 
want  to  count  how  many  squares 
there  are  on  a  chess  board,  and  you 
don’t  know  how  to  multiply  eight 
times  eight.  You  could  start  from 
the  upper  left,  count  eight  squares 
across,  then  continue  with  the  sec¬ 
ond  row,  and  so  on,  until  you 
how  counted  every  square,  giving 
the  result  of  64.  In  slightly  more 
formal  mathematical  language,  we 
could  write  the  following  recipe: 
for  each  row,  r,  from  1  to  8,  con¬ 
sider  the  columns,  c,  from  1  to  8, 
and  add  one  to  the  count  for  each 
one  of  them.  Using  the  sigma  no¬ 
tation,  this  becomes 

8  8 

EE1- 

r—l  c— 1 

If  you’re  familiar  with  computer 
programming,  then  you  can  think 
of  this  as  a  sum  that  could  be 
calculated  using  a  loop  nested  in¬ 
side  another  loop.  To  evaluate  the 
result  (again,  assuming  we  don’t 


know  how  to  multiply,  so  we  have 
to  use  brute  force),  we  can  first 
evaluate  the  inside  sum,  which 
equals  8,  giving 

8 

E* 

r—l 

Notice  how  the  “dummy”  variable 
c  has  disappeared.  Finally  we  do 
the  outside  sum,  over  r,  and  find 
the  result  of  64. 

Now  imagine  doing  the  same  thing 
with  the  pixels  on  a  TV  screen. 
The  electron  beam  sweeps  across 
the  screen,  painting  the  pixels  in 
each  row,  one  at  a  time.  This  is  re¬ 
ally  no  different  than  the  example 
of  the  chess  board,  but  because  the 
pixels  are  so  small,  you  normally 
think  of  the  image  on  a  TV  screen 
as  continuous  rather  than  discrete. 
This  is  the  idea  of  an  integral  in 
calculus.  Suppose  we  want  to  find 
the  area  of  a  rectangle  of  width  a 
and  height  6,  and  we  don’t  know 
that  we  can  just  multiply  to  get 
the  area  ab.  The  brute  force  way 
to  do  this  is  to  break  up  the  rect¬ 
angle  into  a  grid  of  infinitesimally 
small  squares,  each  having  width 
dx  and  height  d y,  and  therefore  the 
infinitesimal  area  d A  =  dx  d y.  For 
convenience,  we’ll  imagine  that  the 
rectangle’s  lower  left  corner  is  at 
the  origin.  Then  the  area  is  given 
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by  this  integral: 


pb  pa 

area  =  /  /  d  A 

J y—O J x=0 


Notice  how  the  leftmost  integral 
sign,  over  y,  and  the  rightmost 
differential,  d y,  act  like  bookends, 
or  the  pieces  of  bread  on  a  sand¬ 
wich.  Inside  them,  we  have  the  in¬ 
tegral  sign  that  runs  over  x,  and 
the  differential  dx  that  matches  it 
on  the  right.  Finally,  on  the  inner¬ 
most  layer,  we’d  normally  have  the 
thing  we’re  integrating,  but  here’s 
it’s  1,  so  I’ve  omitted  it.  Writ¬ 
ing  the  lower  limits  of  the  integrals 
with  x  =  and  y  =  helps  to  keep 
it  straight  which  integral  goes  with 
with  differential.  The  result  is 


let  its  legs  run  from  the  origin  to  (0,  a), 
and  then  to  (a,  a).  In  other  words,  the 
triangle  sits  on  top  of  its  hypotenuse. 
Then  the  integral  can  be  set  up  the 
same  way  as  the  one  before,  but  for  a 
particular  value  of  y,  values  of  x  only 
run  from  0  (on  the  y  axis)  to  y  (on  the 
hypotenuse).  We  then  have 


pb  pa 

area  =  /  /  d^4 

J  y—0  J  x—0 


Ay 


a  Ay 


=  ab. 


Note  that  in  this  example,  because  the 
upper  end  of  the  x  values  depends 
on  the  value  of  y,  it  makes  a  differ¬ 
ence  which  order  we  do  the  integrals 
in.  The  x  integral  has  to  be  on  the  in¬ 
side,  and  we  have  to  do  it  first. 


Volume  of  a  cube  Example  95 
t>  Find  the  volume  of  a  cube  with  sides 
of  length  a. 


Area  of  a  triangle  Example  94 

>  Find  the  area  of  a  45-45-90  right  tri¬ 
angle  having  legs  a. 

o  Let  the  triangle’s  hypotenuse  run 
from  the  origin  to  the  point  (a,  a),  and 


>  This  is  a  three-dimensional  example, 
so  we’ll  have  integrals  nested  three 
deep,  and  the  thing  we’re  integrating 
is  the  volume  6V  =  dxdydz. 


9.2.  APPLICATIONS 
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ca  ra  ra 

J  y= 0  J  x=i 

/'  /’  /' 

J  z= 0  J  y= 0  J  x= i 

-f  f 

J  z= 0  -/y=0 


volume  = 


/  z=0  V  y=0  V  x=0 
y*a  /»a  /*a 


6V 


dxdydz 


y=0  Vx=0 
a  ra 

adydz 


y= 

a  /*a 


=  a 


dydz 


7z=0  </y=0 

=  a  /a  adz 
Jz=0 
ra 

=  a2  /  dz 

J  z=0 


=  a 


The  definite  integral  equals  7t,  as  you 
can  find  using  a  trig  substitution  or 
simply  by  looking  it  up  in  a  table,  and 
the  result  is,  as  expected,  nRz/2  for 
the  area  of  the  semicircle.  Doubling  it, 
we  find  the  expected  result  of  nR2  for 
a  full  circle. 

9.2  Applications 

Up  until  now,  the  integrand  of  the 
innermost  integral  has  always  been 
1,  so  we  really  could  have  done  all 
the  double  integrals  as  single  inte¬ 
grals.  The  following  example  is  one 
in  which  you  really  need  to  do  it¬ 
erated  integrals. 


Area  of  a  circle  Example  96 

>  Find  the  area  of  a  circle. 

>  To  make  it  easy,  let’s  find  the  area 
of  a  semicircle  and  then  double  it.  Let 
the  circle’s  radius  be  r,  and  let  it  be 
centered  on  the  origin  and  bounded 
below  by  the  x  axis.  Then  the  curved 
edge  is  given  by  the  equation  R2  = 
x2  +  y2,  or  y  =  CR2  —  x2.  Since  the 
y  integral’s  limit  depends  on  x,  the  x 
integral  has  to  be  on  the  outside.  The 
area  is 

rr  !‘\J  R2-x2 

area  =  /  /  dy  dx 

J  x=—R  J  y= 0 

=  /  sjR 2  -  x2  dx 

J  x=—R 

=  r  I  "  s/1  -(x/R)2  dx. 

J  x=-R 

Substituting  u  =  x/R , 

/*i 

area  =  R2  \/l  -  u2  du 

J  u=—\ 


a  /  The  famous  tightrope 
walker  Charles  Blondin 
uses  a  long  pole  for  its 
large  moment  of  inertia. 

Moments  of  inertia  Example  97 
The  moment  of  inertia  is  a  measure 
of  how  difficult  it  is  to  start  an  ob- 
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ject  rotating  (or  stop  it).  For  example, 
tightrope  walkers  carry  long  poles  be¬ 
cause  they  want  something  with  a  big 
moment  of  inertia.  The  moment  of  in¬ 
ertia  is  defined  by  /  =  JR2  dm,  where 
dm  is  the  mass  of  an  infinitesimally 
small  portion  of  the  object,  and  R  is 
the  distance  from  the  axis  of  rotation. 


problem.  The  integrand  of  the  remain¬ 
ing  double  integral  breaks  down  into 
two  terms,  each  of  which  depends  on 
only  one  of  the  variables,  so  we  break 
it  into  two  integrals, 


To  start  with,  let’s  do  an  example  that 
doesn’t  require  iterated  integrals.  Let's 
calculate  the  moment  of  inertia  of  a 
thin  rod  of  mass  M  and  length  L  about 
a  line  perpendicular  to  the  rod  and 
passing  through  its  center. 


/  = 


R2  dm 


M 

T 


dx 


[r  =  |x|,  so  R2  =  x2]  which  we  know  have  identical  results. 

We  therefore  only  need  to  evaluate 
_  one  of  them  and  double  the  result: 

12 


Now  let’s  do  one  that  requires  iter¬ 
ated  integrals:  the  moment  of  inertia 
of  a  cube  of  side  b,  for  rotation  about 
an  axis  that  passes  through  its  center 
and  is  parallel  to  four  of  its  faces. 

Let  the  origin  be  at  the  center  of  the 
cube,  and  let  x  be  the  rotation  axis. 


The  fact  that  the  last  step  is  a  trivial  in¬ 
tegral  results  from  the  symmetry  of  the 


,6/2  ,6/2 

/  =  2pb  /  /  z2  dy  d  z 

Jb/2  Jb/2 


=  2P  br 

-i,b‘ 

->2 


,6/2 


z2  dz 


lb/2 
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9.3  Polar  coordinates 


b  /  Rene  Descartes 
(1596-1650) 


Philosopher  and  mathematician 
Rene  Descartes  originated  the  idea 
of  describing  plane  geometry  using 
(cc,  y)  coordinates  measured  from 
a  pair  of  perpendicular  coordinate 
axes.  These  rectangular  coordi¬ 
nates  are  known  as  Cartesian  co¬ 
ordinates,  in  his  honor. 


As  a  logical  extension  of  Descartes’ 
idea,  one  can  find  different  ways  of 
defining  coordinates  on  the  plane, 
such  as  the  polar  coordinates  in  fig¬ 
ure  c.  In  polar  coordinates,  the 
differential  of  area,  figure  d  can  be 
written  as  da  =  R  d R  d <j>.  The  idea 
is  that  since  dl?  and  d <j>  are  in¬ 
finitesimally  small,  the  shaded  area 
in  the  figure  is  very  nearly  a  rect¬ 
angle,  measuring  d R  is  one  dimen¬ 
sion  and  Rdcj)  in  the  other.  (The 
latter  follows  from  the  definition  of 
radian  measure.) 


4?- 


d  /  The  differential  of 
area  in  polar  coordinates 


Example  98 

>  A  disk  has  mass  M  and  radius  b. 
Find  its  moment  of  inertia  for  rota¬ 
tion  about  the  axis  passing  perpendic¬ 
ularly  through  its  center. 
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-3-2-1  12  3 


2 

e  /  The  function  e~x  ,  ex¬ 
ample  99. 

Example  99 

In  statistics,  the  standard  “bell  curve” 
(also  known  as  the  normal  distribution 
or  Gaussian)  is  shaped  like  e~x  .  An 
area  under  this  curve  is  proportional 
to  the  probability  that  x  lies  within  a 
certain  range.  To  fix  the  constant  of 
proportionality,  we  need  to  evaluate 


which  corresponds  to  a  probability  of 
1.  As  discussed  on  p.  95,  the  cor¬ 
responding  indefinite  integral  can’t  be 
done  in  closed  form.  The  definite  in¬ 
tegral  from  -oo  to  +00,  however,  can 
be  evaluated  by  the  following  devious 
trick  due  to  Poisson.  We  first  write  /2 
as  a  product  of  two  copies  of  the  inte¬ 
gral. 


Since  the  variable  of  integration  x  is 
a  “dummy”  variable,  we  can  choose  it 
to  be  any  letter  of  the  alphabet.  Let’s 
change  the  second  one  to  y: 


This  is  in  principle  a  pointless  and  triv¬ 
ial  change,  but  it  suggests  visualizing 
the  right-hand  side  in  the  Cartesian 
plane,  and  considering  it  as  the  inte¬ 
gral  of  a  single  function  that  depends 
on  both  x  and  y: 


Switching  to  polar  coordinates,  we 
have 

?  f2n  f°°  r2 

I2  =  /  e~R  fldfldcj) 

Jo  Jo 

r  00  2 

=  2n  e~R  R6R, 

Jo 

which  can  be  done  using  the  substitu¬ 
tion  u  =  Ft2,  6u  =  2R6R\ 

roo 

l2  =  2n  e~“(du/2) 

Jo 

=  71 

/  =  y/n 
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9.4  Spherical  and 
cylindrical 
coordinates 

In  cylindrical  coordinates  ( R ,  (/>,  z), 
z  measures  distance  along  the  axis, 
R  measures  distance  from  the  axis, 
and  4>  is  an  angle  that  wraps 
around  the  axis. 


tance  divided  by  a  distance.  There¬ 
fore  the  only  factors  in  the  expression 
that  have  units  are  ft,  d ft,  and  dz.  If 
these  three  factors  are  measured,  say, 
in  meters,  then  their  product  has  units 
of  cubic  meters,  which  is  correct  for  a 
volume. 

Example  101 
t>  Find  the  volume  of  a  cone  whose 
height  is  h  and  whose  base  has  radius 
b. 


f  /  Cylindrical  coordinates. 


The  differential  of  volume  in  cylin¬ 
drical  coordinates  can  be  written 
as  di>  =  RdRdz  dip.  This  fol¬ 
lows  from  adding  a  third  dimen¬ 
sion,  along  the  z  axis,  to  the  rect¬ 
angle  in  figure  d. 

Example  100 

>  Show  that  the  expression  for  dv  has 
the  right  units. 

>  Angles  are  unitless,  since  the  defini¬ 
tion  of  radian  measure  involves  a  dis- 


>  Let's  plan  on  putting  the  z  integral 
on  the  outside  of  the  sandwich.  That 
means  we  need  to  express  the  radius 
rmax  of  the  cone  in  terms  of  z.  This 
comes  out  nice  and  simple  if  we  imag¬ 
ine  the  cone  upside  down,  with  its  tip 
at  the  origin.  Then  since  we  have 
rmax{z  =  0)  =  0,  and  rmax(h)  =  b,  ev¬ 
idently  rmax  =  zb/h. 


v-Jdv 

ph  pzb/h  p2n 


J  z= 0  J  r= 0 


'  4>=0 


Rdtir  dftdz 


(‘Zb/h 


=  2n 


RdRdz 


/  z= 0  J  r= 0 
rh 


=  2n  (zb/h)  /2  dz 


=  n(b/hf 
nb2h 


/” 
/  z=0 


z2  dz 


As  a  check,  we  note  that  the  answer 
has  units  of  volume.  This  is  the  classi¬ 
cal  result,  known  by  the  ancient  Egyp¬ 
tians,  that  a  cone  has  one  third  the  vol¬ 
ume  of  its  enclosing  cylinder. 

In  spherical  coordinates  (r,  &,(/)), 
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the  coordinate  r  measures  the  dis¬ 
tance  from  the  origin,  and  9  and  <f> 
are  analogous  to  latitude  and  lon¬ 
gitude,  except  that  6  is  measured 
down  from  the  pole  rather  than 
from  the  equator. 


f 


g  /  Spherical  coordinates. 


The  differential  of  volume  in 
spherical  coordinates  is  dr>  = 
r2  sin  6  dr  d 9  d (j>. 

Example  102 

o  Find  the  volume  of  a  sphere. 

> 


v  =  j6v 

/*7T  rr=b  r2rc 


r  sin0dc|)drd0 


le=0Jr=0  j  4>=0 
fu  nr=b 


=  2n 


r  sine  dr  d0 


/  e=o  J  r=0 


=  2n  ■ 


Anb 


sin  0d0 


/0=O 


3 
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Problems 

1  Pascal’s  snail  (named  after 
Etienne  Pascal,  father  of  Blaise 
Pascal)  is  the  shape  shown  in  the 
figure,  defined  by  R  =  6(1  +  cosd) 
in  polar  coordinates. 

(a)  Make  a  rough  visual  estimate 
of  its  area  from  the  figure. 

(b)  Find  its  area  exactly,  and  check 
against  your  result  from  part  a. 

(c)  Show  that  your  answer  has  the 
right  units.  [Thompson,  1919] 


Problem  1 :  Pascal’s  snail  with  b  =  1 . 


2  A  cone  with  a  curved  base  is 
defined  by  r  <  6  and  6  <  7r/4  in 
spherical  coordinates. 

(a)  Find  its  volume. 

(b)  Show  that  your  answer  has  the 
right  units. 

3  Find  the  moment  of  inertia  of 
a  sphere  for  rotation  about  an  axis 
passing  through  its  center. 

4  A  jump-rope  swinging  in  circles 
has  the  shape  of  a  sine  function. 


Find  the  volume  enclosed  by  the 
swinging  rope,  in  terms  of  the  ra¬ 
dius  6  of  the  circle  at  the  rope’s 
fattest  point,  and  the  straight-line 
distance  t  between  the  ends. 

5  A  curvy-sided  cone  is  defined  in 
cylindrical  coordinates  by  0  <  z  < 
h  and  R  <  kz2.  (a)  What  units 
are  implied  for  the  constant  k?  (b) 
Find  the  volume  of  the  shape,  (c) 
Check  that  your  answer  to  b  has 
the  right  units. 

6  The  discovery  of  nuclear  fis¬ 
sion  was  originally  explained  by 
modeling  the  atomic  nucleus  as  a 
drop  of  liquid.  Like  a  water  bal¬ 
loon,  the  drop  could  spin  or  vi¬ 
brate,  and  if  the  motion  became 
sufficiently  violent,  the  drop  could 
split  in  half  —  undergo  fission.  It 
was  later  learned  that  even  the 
nuclei  in  matter  under  ordinary 
conditions  are  often  not  spherical 
but  deformed,  typically  with  an 
elongated  ellipsoidal  shape  like  an 
American  football.  One  simple 
way  of  describing  such  a  shape  is 
with  the  equation 

r  <  6[1  +  c( cos2  9  —  k)\, 

where  c  =  0  for  a  sphere,  c  >  0  for 
an  elongated  shape,  and  c  <  0  for 
a  flattened  one.  Usually  for  nuclei 
in  ordinary  matter,  c  ranges  from 
about  0  to  +0.2.  The  constant  k 
is  introduced  because  without  it,  a 
change  in  c  would  entail  not  just 
a  change  in  the  shape  of  the  nu¬ 
cleus,  but  a  change  in  its  volume 
as  well.  Observations  show,  on  the 
contrary,  that  the  nuclear  fluid  is 
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highly  incompressible,  just  like  or-  the  radius  of  the  pipe,  and  r  is  the 
dinary  water,  so  the  volume  of  the  distance  from  the  axis.  Find  the 
nucleus  is  not  expected  to  change  average  velocity  at  which  water  is 
significantly,  even  in  violent  pro-  transported  through  the  pipe, 
cesses  like  fission.  Calculate  the 
volume  of  the  nucleus,  throwing 
away  terms  of  order  c2  or  higher, 
and  show  that  k  =  1/3  is  required 
in  order  to  keep  the  volume  con¬ 
stant. 

7  This  problem  is  a  continua¬ 
tion  of  problem  6,  and  assumes  the 
result  of  that  problem  is  already 
known.  The  nucleus  168Er  has  the 
type  of  elongated  ellipsoidal  shape 
described  in  that  problem,  with 
c  >  0.  Its  mass  is  2.8  x  10-25  kg, 
it  is  observed  to  have  a  moment 
of  inertia  of  2.62  x  10~54  kg-m2 
for  end-over-end  rotation,  and  its 
shape  is  believed  to  be  described 
by  b  ss  6  x  10-15  m  and  c  «  0.2. 

Assuming  that  it  rotated  rigidly, 
the  usual  equation  for  the  moment 
of  inertia  could  be  applicable,  but 
it  may  rotate  more  like  a  water  bal¬ 
loon,  in  which  case  its  moment  of 
inertia  would  be  significantly  less 
because  not  all  the  mass  would  ac¬ 
tually  flow.  Test  which  type  of  ro¬ 
tation  it  is  by  calculating  its  mo¬ 
ment  of  inertia  for  end-over-end  ro¬ 
tation  and  comparing  with  the  ob¬ 
served  moment  of  inertia.  * 

8  Von  Karman  found  empirically 
that  when  a  fluid  flows  turbulently 
through  a  cylindrical  pipe,  the  ve¬ 
locity  of  flow  v  varies  according 
to  the  “1/7  power  law,”  v/vQ  = 

(1  —  r/i?)1/7,  where  vQ  is  the  veloc¬ 
ity  at  the  center  of  the  pipe,  R  is 
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Formal  definition  of  the  tangent  line 

Given  a  function  x(t),  consider  any  point  P  =  (a,x(a))  on  its  graph. 
Let  the  function  £(t)  be  a  line  passing  through  P.  We  say  that  £  cuts 
through  x  at  P  if  there  exists  some  real  number  d  >  0  such  that  the 
graph  of  £  is  on  one  side  of  the  graph  of  x  for  all  a  —  d  <  t  <  a ,  and  is 
on  the  other  side  for  all  a  <  t  <  a  +  d. 

Definition  (Marsden1):  A  line  £  through  P  is  said  to  be  the  line  tangent 
to  x  at  P  if  all  lines  through  P  with  slopes  less  than  that  of  £  cut 
through  x  in  one  direction,  while  all  lines  with  slopes  greater  than  P’s 
cut  through  it  in  the  opposite  direction. 

The  reason  for  the  complication  in  the  definition  is  that  there  are  cases 
in  which  the  function  is  smooth  and  well-behaved  throughout  a  certain 
region,  but  for  a  certain  point  P  in  that  region,  all  lines  through  P  cut 
through  P.  For  example,  the  function  x(t)  =  t 3  is  blessed  everywhere 
with  lines  that  don’t  cut  through  it  —  everywhere,  that  is,  except  at 
t  =  0,  which  is  an  inflection  point  (p.  17).  Our  definition  fills  in  the 
“gap  tooth”  in  the  derivative  function  in  the  obvious  way. 

Example  103 

As  an  example,  we  demonstrate  that  the  derivative  of  f3  is  zero  where  it  passes 
through  the  origin.  Define  the  line  £(t)  =  bt  with  slope  b,  passing  through  the 
origin.  For  b  <  0,  £  cuts  the  graph  of  f3  once  at  the  origin,  going  down  and  to 
the  right.  For  b  >  0,  £  cuts  the  graph  of  f3  in  three  places,  at  t  =  0  and  ±Vb. 
Picking  any  positive  value  of  d  less  than  Vb,  we  find  that  £  cuts  the  graph  at 
the  origin,  going  up  and  to  the  right.  Therefore  b  =  0  gives  the  tangent  line  at 
the  origin. 

1  Calculus  Unlimited ,  by  Jerrold  Marsden  and  Alan  Weinstein, 

http : / /resolver . caltech . edu/CaltechBOOK : 1981 . 001 
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Derivatives  of  polynomials 

Some  ideas  in  this  proof  are  due  to  Tom  Goodwillie. 

Theorem:  For  n  =  0,  1,  2,  . . . ,  the  derivative  of  the  function  x  defined 
by  x(t)  =  tn  is  x  =  ntn~l. 

The  results  for  n  =  0  and  1  hold  by  direct  application  of  the  definition 
of  the  derivative. 

For  n  >  1,  it  suffices  to  prove  £(0)  =  0  and  x(l )  =  n,  since  the  result  for 
other  nonzero  values  of  t  then  follows  by  the  kind  of  scaling  argument 
used  on  page  13  for  the  n  =  2  case. 

We  use  the  following  properties  of  the  derivative,  all  of  which  follow 
immediately  from  its  definition  as  the  slope  of  the  tangent  line: 

Shift.  Shifting  a  function  x{t)  horizontally  to  form  a  new  function  x(t+c) 
gives  a  derivative  at  any  newly  shifted  point  that  is  the  same  as 
the  derivative  at  the  corresponding  point  on  the  unshifted  graph. 

Flip.  Flipping  the  function  x(t)  to  form  a  new  function  x(—t)  negates 
its  derivative  at  t  =  0. 

Add.  The  derivative  of  the  sum  or  difference  of  two  functions  is  the  sum 
or  difference  of  their  derivatives. 

For  even  n,  i(0)  =  0  follows  from  the  flip  property,  since  x(—t)  is  the 
same  function  as  x(t).  For  n  =  3,  5,  . . . ,  we  apply  the  definition  of  the 
derivative  in  the  same  manner  as  was  done  in  the  preceding  section  for 
n  =  3. 

We  now  need  to  show  that  a:(l)  =  n.  Define  the  function  u  as 

u(t)  =  x{t  +  1)  —  x(t) 

=  1  +  Tit  T  ■  .  .  , 

where  the  second  line  follows  from  the  binomial  theorem,  and 
. . .  represents  terms  involving  t 2  and  higher  powers.  Since  we’ve  already 
established  the  results  for  n  =  0  and  1,  differentiation  gives 

u(t)  =  n+  ... . 

Now  let’s  evaluate  this  at  t.  =  0,  where,  as  shown  earlier,  the  terms 
represented  by  . . .  all  vanish.  Applying  the  add  and  shift  properties,  we 
have 


i(l)  —  i(O)  =  n. 
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But  since  x(0)  =  0,  this  completes  the  proof. 

Although  this  proof  was  for  integer  exponents  n  >  1,  the  result  is  also 
true  for  any  real  value  of  n;  see  example  24  on  p.  41. 


Details  of  the  proof  of  the  derivative  of  the  sine  function 

Some  ideas  in  this  proof  are  due  to  Jerome  Keisler  (see  references,  p. 

201). 

On  page  28,  I  computed  the  derivative  of  sin  t,  to  be  cos  t  as  follows: 

dx  =  sin(f  +  dt)  —  sint, 

=  sin  t  cos  dt 

+  cos  t  sin  dt  —  sin  t 
=  cos  t  dt  + - 

We  want  to  prove  prove  that  the  error  “. . .  ”  introduced  by  the  small- 
angle  approximations  really  is  of  order  dt2. 

A  quick  and  dirty  way  to  check  whether  this  is  likely  to  be  true  is  to 
use  Inf  to  calculate  sin(t  +  dt)  at  some  specific  value  of  t.  For  example, 
at  t  =  1  we  have  this  result: 


:  sin(l+d) 

(0 . 84147)  +  (0 . 54030)  d 
+  (-0. 42074) d ~2+ (-0 .  09006) d ~3 
+  (0. 03506)  d~4 


The  small-angle  approximations  give  sin(l  +  d)  ss  sinl  +  (cosl)d.  The 
coefficients  of  the  first  two  terms  of  the  exact  result  are,  as  expected 
sin(l)  =  0.84147  and  cos(l)  =  0.5403...,  so  although  the  small-angle 
approximations  have  introduced  some  errors,  they  involve  only  higher 
powers  of  dt,  as  claimed. 

The  demonstration  with  Inf  has  two  shortcomings.  One  is  that  it  only 
works  for  f  =  1,  but  we  need  to  prove  that  the  result  for  all  values 
of  t.  That  doesn’t  mean  that  the  check  for  t  =  1  was  useless.  Even 
though  a  general  mathematical  statement  about  all  numbers  can  never 
be  proved  by  demonstrating  specific  examples  for  which  it  succeeds,  a 
single  counterexample  suffices  to  disprove  it.  The  check  for  t  =  1  was 
worth  doing,  because  if  the  first  term  had  come  out  to  be  0.88888,  it 
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would  have  immediately  disproved  our  claim,  thereby  saving  us  from 
wasting  hours  attempting  to  prove  something  that  wasn’t  true. 

The  other  problem  is  that  I’ve  never  explained  how  Inf  calculates  this 
kind  of  thing.  The  answer  is  that  it  uses  something  called  a  Taylor 
series,  discussed  in  section  7.4.  Using  Inf  here  without  knowing  yet 
how  Taylor  series  work  is  like  using  your  calculator  as  a  “black  box” 
to  extract  the  square  root  of  y/2  without  knowing  how  it  does  it.  Not 
knowing  the  inner  workings  of  the  black  box  makes  the  demonstration 
less  than  satisfying. 

In  any  case,  this  preliminary  check  makes  it  sound  like  it’s  reasonable 
to  go  on  and  try  to  produce  a  real  proof.  We  have 

sin(f  +  df )  =  sin  t  +  cos  t  dt  —  E, 

where  the  error  E  introduced  by  the  approximations  is 

E  =  sinf(l  —  cosdf) 

+  cost(dt  —  sindf). 


Let  the  radius  of  the  circle  in  figure  a  be  one,  so  AD  is  cos  dt  and  CD  is 

C  E 


a  /  Geometrical  interpre¬ 
tation  of  the  error  term. 

sindf.  The  area  of  the  shaded  pie  slice  is  dt/2,  and  the  area  of  triangle 
ABC  is  sindf/2,  so  the  error  made  in  the  approximation  sindf  «  dt 
equals  twice  the  area  of  the  dish  shape  formed  by  line  BC  and  arc  BC. 
Therefore  dt  — sindf  is  less  than  the  area  of  rectangle  CEBD.  But  CEBD 
has  both  an  infinitesimal  width  and  an  infinitesimal  height,  so  this  error 
is  of  no  more  than  order  elf2. 

For  the  approximation  cosdf  ss  1,  the  error  (represented  by  BD)  is 
1  —  cosdf  =  1  —  \J  1  —  sin2  dt,  which  is  less  than  1  —  \/l  —  dt2,  since 
sindf  <  dt.  Therefore  this  error  is  of  order  dt2. 
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Formal  statement  of  the  transfer  principle 

On  page  33,  I  gave  an  informal  description  of  the  transfer  principle.  The 
idea  being  expressed  was  that  the  phrases  “for  any”  and  “there  exists” 
can  only  be  used  in  phrases  like  “for  any  real  number  x”  and  “there 
exists  a  real  number  y  such  that. . .  ”  The  transfer  principle  does  not 
apply  to  statements  like  “there  exists  an  integer  x  such  that. . .  ”  or 
even  “there  exists  a  subset  of  the  real  numbers  such  that. . .  ” 

The  way  to  state  the  transfer  principle  more  rigorously  is  to  get  rid  of 
the  ambiguities  of  the  English  language  by  restricting  ourselves  to  a  well- 
defined  language  of  mathematical  symbols.  This  language  has  symbols 
V  and  3,  meaning  ’’for  all”  and  ’’there  exists,”  and  these  are  called 
quantifiers.  A  quantifier  is  always  immediately  followed  by  a  variable, 
and  then  by  a  statement  involving  that  variable.  For  example,  suppose 
we  want  to  say  that  a  number  greater  than  1  exists.  We  can  write  the 
statement  3x  x  >  1,  read  as  “there  exists  a  number  x  such  that  x  is 
greater  than  1.”  We  don’t  actually  need  to  say  “there  exists  a  number 
x  in  the  set  of  real  numbers  such  that  . . . ,”  because  our  intention  here 
is  to  make  statements  that  can  be  translated  back  and  forth  between 
the  reals  and  the  hyperreals.  In  fact,  we  forbid  this  type  of  explicit 
reference  to  the  domain  to  which  the  quantifiers  apply.  This  restriction 
is  described  technically  by  saying  that  we’re  only  allowing  first-order 
logic. 

Quantifiers  can  be  nested.  For  example,  I  can  state  the  commutativity 
of  addition  as  \/x\/y  x  +  y  =  y  +  x,  and  the  existence  of  additive  inverses 
as  \/x3y  x  +  y  =  0. 

After  the  quantifier  and  the  variable,  we  have  some  mathematical  as¬ 
sertion,  in  which  we’re  allowed  to  use  the  symbols  =,  >,  x  and  +  for 
the  basic  operations  of  arithmetic,  and  also  parentheses  and  the  logical 
operators  — ,  A  and  V  for  “not,”  “and,”  and  “or.”  Although  we  will 
often  find  it  convenient  to  use  other  symbols,  such  as  0,  1,  — ,  /,  <, 
7^,  etc.,  these  are  not  strictly  necesary.  We  use  them  only  as  a  way  of 
making  the  formulas  more  readable,  with  the  understanding  that  they 
could  be  translated  into  the  more  basic  symbols.  For  instance,  I  can 
restate  3x  x  >  1  as  3x3yMz  yz  =  z  A  x  >  y.  The  number  y  ends  up 
just  being  a  name  for  1,  because  it’s  the  only  number  that  will  always 
satisfy  yz  =  z. 

Finally,  these  statements  need  to  satisfy  certain  syntactic  rules.  For 
example,  we  can’t  have  a  string  of  symbols  like  x  +  xy,  because  the 
operators  +  and  x  are  supposed  to  have  numbers  on  both  sides. 
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A  finite  string  of  symbols  satisfying  all  the  above  rules  is  called  a  well- 
formed  formula  (wff)  in  first-order  logic. 

The  transfer  principle  states  that  a  wff  is  true  on  the  real  numbers  if 
and  only  if  it  is  true  on  the  hyperreal  numbers. 

If  you  look  in  an  elementary  algebra  textbook  at  the  statement  of  all  the 
elementary  axioms  of  the  real  number  system,  such  as  commutativity 
of  multiplication,  associativity  of  addition,  and  so  on,  you’ll  see  that 
they  can  all  be  expressed  in  terms  of  first-order  logic,  and  therefore 
you  can  use  them  when  manipulating  hyperreal  numbers.  However,  it’s 
not  possible  to  fully  characterize  the  real  number  system  without  giving 
at  least  some  further  axioms  that  cannot  be  expressed  in  first  order. 
There  is  more  than  one  way  to  set  up  these  additional  axioms,  but 
for  example  one  common  axiom  to  use  is  the  Archimedean  principle, 
which  states  that  there  is  no  number  that  is  greater  than  1,  greater 
than  1  +  1,  greater  than  1  +  1  +  1,  and  so  on.  If  we  try  to  express 
this  as  a  well-formed  formula  in  first  order  logic,  one  attempt  would 
be  ~>3x  x  >  \  A  x  >  1  +  1  A  x  >  1  +  1  +  1...,  where  the  . . . 
indicates  that  the  string  of  symbols  would  have  to  go  on  forever.  This 
doesn’t  work  because  a  well-formed  formula  has  to  be  a  finite  string 
of  symbols.  Another  attempt  would  be  3 x\/n  £  N  x  >  n,  where  N 
means  the  set  of  integers.  This  one  also  fails  to  be  a  wff  in  first-order 
logic,  because  in  first-order  logic  we’re  not  allowed  to  explicitly  refer 
to  the  domain  of  a  quantifier.  We  conclude  that  the  transfer  principle 
does  not  necessarily  apply  to  the  Archimedean  principle,  and  in  fact 
the  Archimedean  principle  is  not  true  on  the  hyperreals,  because  they 
include  numbers  that  are  infinite. 

Now  that  we  have  a  thorough  and  rigorous  understanding  of  what  the 
transfer  principle  says,  the  next  obvious  question  is  why  we  should  be¬ 
lieve  that  it’s  true.  This  is  discussed  in  the  following  section. 

Is  the  transfer  principle  true? 

The  preceding  section  stated  the  transfer  principle  in  rigorous  language. 
But  why  should  we  believe  that  it’s  true? 

One  approach  would  be  to  begin  deducing  things  about  the  hyperreals, 
and  see  if  we  can  deduce  a  contradiction.  As  a  starting  point,  we  can 
use  the  axioms  of  elementary  algebra,  because  the  transfer  principle 
tells  us  that  those  apply  to  the  hyperreals  as  well.  Since  we  also  assume 
that  the  Archimedean  principle  does  not  hold  for  the  hyperreals,  we 


145 


can  also  base  our  reasoning  on  that,  and  therefore  many  of  the  things 
we  can  prove  will  be  things  that  are  true  for  the  hyperreals,  but  false 
for  the  reals.  This  is  essentially  what  mathematicians  started  doing 
immediately  after  Newton  and  Leibniz  invented  the  calculus,  and  they 
were  immediately  successful  in  producing  contradictions.  However,  they 
weren’t  using  formally  defined  logical  systems,  and  they  hadn’t  stated 
anything  as  specific  and  rigorous  as  the  transfer  principle.  In  particular, 
they  didn’t  understand  the  need  for  anything  like  our  restriction  of  the 
transfer  principle  to  first-order  logic.  If  we  could  reach  a  contradiction 
based  on  the  more  modern,  rigorous  statement  of  the  transfer  principle, 
that  would  be  a  different  matter.  It  would  tell  us  that  one  of  two  things 
was  true:  either  (1)  the  hyperreal  number  system  lacks  logical  self- 
consistency,  or  (2)  both  the  hyperreals  and  the  reals  lack  self-consistency. 

Abraham  Robinson  proved,  however,  around  1960  that  the  reals  and  the 
hyperreals  have  the  same  level  of  consistency:  one  is  self-consistent  if 
and  only  if  the  other  is.  In  other  words,  if  the  hyperreals  harbor  a  ticking 
logical  time  bomb,  so  do  the  reals.  Since  most  mathematicians  don’t 
lose  much  sleep  worrying  about  a  lack  of  self-consistency  in  the  real 
number  system,  this  is  generally  taken  as  meaning  that  infinitesimals 
have  been  rehabilitated.  In  fact,  it  gives  them  an  even  higher  level 
of  respectability  than  they  had  in  the  era  of  Gauss  and  Euler,  when 
they  were  widely  used,  but  mathematicians  knew  a  valid  style  of  proof 
involving  infinitesimals  only  because  they’d  slowly  developed  the  right 
“Spidey  sense.” 

But  how  in  the  world  could  Robinson  have  proved  such  a  thing?  It  seems 
like  a  daunting  task.  There  is  an  infinite  number  of  possible  logical  trains 
of  argument  in  mathematics.  How  could  he  have  demonstrated,  with  a 
stroke  of  a  pen,  that  none  of  them  could  ever  lead  to  a  contradiction 
(unless  it  indicated  a  contradiction  lurking  in  the  real  number  system 
as  well)?  Obviously  it’s  not  possible  to  check  them  all  explicitly. 

The  way  modern  logicians  prove  such  things  is  usually  by  using  models. 
For  an  easy  example  of  a  model,  consider  Euclidean  geometry.  Euclid 
believed  that  the  following  four  postulates1 2 3  were  all  self-evident: 

1.  Let  the  following  be  postulated:  to  draw  a  straight  line  from  any 
point  to  any  point. 

2.  To  extend  a  finite  straight  line  continuously  in  a  straight  line. 

3.  To  describe  a  circle  with  any  center  and  radius. 

-modified  slightly  by  me  from  a  translation  by  T.L.  Heath,  1925 
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4.  That  all  right  angles  are  equal  to  one  another. 

These  postulates,  which  today  we  would  call  “axioms,”  played  the  same 
role  with  respect  to  Euclidean  geometry  that  the  elementary  axioms  of 
arithmetic  play  for  the  real  number  system. 

Euclid  also  found  that  he  needed  a  fifth  postulate  in  order  to  prove  many 
of  his  most  important  theorems,  such  as  the  Pythagorean  theorem.  I'll 
state  a  different  axiom  that  turns  out  to  be  equivalent  to  it: 

5.  Playfair’s  version  of  the  parallel  postulate:  Given  any  infinite 
line  L,  and  any  point  P  not  on  that  line,  there  exists  a  unique  infinite 
line  through  P  that  never  crosses  L. 

The  ancients  believed  this  to  be  less  obviously  self-evident  than  the  first 
four,  partly  because  if  you  were  given  the  two  lines,  it  could  theoretically 
take  an  infinite  amount  of  time  to  inspect  them  and  verify  that  they 
never  crossed,  even  at  some  very  distant  point.  Euclid  avoided  even 
mentioning  infinite  lines  in  postulates  1-4,  and  he  considered  postulate  5 
to  be  so  much  less  intuitively  appealing  in  comparison  that  he  organized 
the  Elements  so  that  the  first  28  propositions  were  those  that  could  be 
proved  without  resorting  to  it.  Continuing  the  analogy  with  the  reals 
and  hyperreals,  the  parallel  postulate  plays  the  role  of  the  Archimedean 
principle:  a  statement  about  infinity  that  we  don’t  feel  quite  so  sure 
about. 

For  centuries,  geometers  tried  to  prove  the  parallel  postulate  from  the 
first  five.  The  trouble  with  this  kind  of  thing  was  that  it  could  be  difficult 
to  tell  what  was  a  valid  proof  and  what  wasn’t.  The  postulates  were 
written  in  an  ambiguous  human  language,  not  a  formal  logical  system. 
As  an  example  of  the  kind  of  confusion  that  could  result,  suppose  we 
assume  the  following  postulate,  5',  in  place  of  5: 

5h  Given  any  infinite  line  L,  and  any  point  P  not  on  that  line,  every 
infinite  line  through  P  crosses  L. 

Postulate  5'  plays  the  role  for  noneuclidean  geometry  that  the  negation 
of  the  Archimedean  principle  plays  for  the  hyperreals.  It  tells  us  we’re 
not  in  Kansas  anymore.  If  a  geometer  can  start  from  postulates  1-4 
and  5'  and  arrive  at  a  contradiction,  then  he’s  made  significant  progress 
toward  proving  that  postulate  5  has  to  be  true  based  on  postulates  1-4. 
(He  would  also  have  to  disprove  another  version  of  the  postulate,  in 
which  there  is  more  than  one  parallel  through  P.)  For  centuries,  there 
have  been  reasonable-sounding  arguments  that  seemed  to  give  such  a 
contradiction.  For  instance,  it  was  proved  that  a  geometry  with  5'  in  it 
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was  one  in  which  distances  were  limited  to  some  finite  maximum.  This 
would  appear  to  contradict  postulate  3,  since  there  would  be  a  limit 
on  the  radius  of  a  circle.  But  there’s  plenty  of  room  for  disagreement 
here,  because  the  ancient  Greeks  didn’t  have  any  notion  of  a  set  of  real 
numbers.  For  them,  the  thing  we  would  call  a  number  was  simply  a 
finite  straight  line  (line  segment)  with  a  certain  length.  If  postulate 
3  says  that  we  can  make  a  circle  given  any  radius,  it’s  reasonable  to 
interpret  that  as  a  statement  that  given  any  finite  straight  line  as  the 
specification  of  the  radius,  we  can  make  the  circle.  There  is  then  no 
contradiction,  because  the  too-long  radius  can’t  be  specified  in  the  first 
place.  This  muddle  is  similar  to  the  kind  of  confusion  that  reigned  for 
centuries  after  Newton:  did  infinitesimals  lead  to  contradictions? 

In  the  19th  century,  Lobachevsky  and  Bolyai  came  up  with  a  version  of 
Euclid’s  axioms  that  was  more  rigorously  defined,  and  that  was  care¬ 
fully  engineered  to  avoid  the  kinds  of  contradictions  that  had  previously 
been  discovered  in  noneuclidean  geometry.  This  is  analogous  to  the  in¬ 
vention  of  the  transfer  principle  and  the  realization  that  the  restriction 
to  first-order  logic  was  necessary.  Lobachevsky  and  Bolyai  slaved  away 
for  year  after  year  proving  new  results  in  noneuclidean  geometry,  won¬ 
dering  whether  they  would  ever  reach  a  contradiction.  Eventually  they 
started  to  doubt  that  there  were  ever  going  to  be  contradictions,  and 
finally  they  proved  that  the  contradictions  didn’t  exist. 

The  technique  for  proving  consistency  was  to  make  a  model  of  the  noneu- 
clidean  system.  Consider  geometry  done  on  the  surface  of  a  sphere.  The 
word  “line”  in  the  axioms  now  has  to  be  understood  as  referring  to  a 
great  circle,  i.e. ,  one  with  the  same  radius  as  the  sphere.  The  parallel 
postulate  fails,  because  parallels  don’t  exist:  every  great  circle  intersects 
every  other  great  circle.  One  modification  has  to  be  made  to  the  model 
in  order  to  make  it  consistent  with  the  first  postulate.  The  constructions 
described  in  Euclid’s  postulates  are  tacitly  assumed  to  be  unique  (and 
in  more  rigorous  formulations  are  explicitly  stated  to  be  so).  We  want 
there  to  be  a  unique  line  defined  by  any  two  distinct  points.  This  works 
fine  on  the  sphere  as  long  as  the  points  aren’t  too  far  apart,  but  it  fails  if 
the  points  are  antipodes,  i.e.,  they  lie  at  opposite  sides  of  the  sphere.  For 
example,  every  line  of  longitude  on  the  Earth’s  surface  passes  through 
both  poles.  The  solution  to  this  problem  is  to  modify  what  we  mean  by 
“point.”  Points  at  each  other’s  antipodes  are  considered  to  be  the  same 
point.  (Or,  equivalently,  we  can  do  geometry  on  a  hemisphere,  but  agree 
that  when  we  go  off  one  edge,  we  “wrap  around”  to  the  opposite  side.) 

This  spherical  model  obeys  all  the  postulates  of  this  particular  system  of 
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noneuclidean  geometry.  But  consider  now  that  we  constructed  it  inside 
a  surrounding  three-dimensional  space  in  which  the  parallel  postulate 
does  hold.  Now  suppose  we  keep  on  proving  theorems  in  this  system 
of  noneuclidean  geometry,  filling  up  page  after  page  with  proofs  using 
words  like  “line,”  which  we  mentally  associate  with  great  circles  on  a 
certain  sphere  —  and  eventually  we  reach  a  contradiction.  But  now  we 
can  go  back  through  our  proofs,  and  in  every  place  where  the  word  “line” 
occurs  we  can  cross  it  out  with  a  red  pencil  and  put  in  “great  circle  on 
this  particular  sphere.”  It  would  now  be  a  proof  about  Euclidean  geom¬ 
etry,  and  the  contradiction  would  prove  that  Euclidean  geometry  lacked 
self-consistency.  We  therefore  arrive  at  the  result  that  if  noneuclidean 
geometry  is  inconsistent,  so  is  Euclidean  geometry.  Since  nobody  be¬ 
lieves  that  Euclidean  geometry  is  inconsistent,  this  is  considered  the 
moral  equivalent  of  proving  noneuclidean  geometry  to  be  consistent. 

If  you’ve  been  keeping  the  system  of  analogies  in  mind  as  you  read  this 
story,  it  should  be  clear  what’s  coming  next.  If  we  want  to  prove  that 
the  hyperreals  have  the  same  consistency  as  the  reals,  we  just  have  to 
construct  a  model  of  the  hyperreals  using  the  reals.  This  is  done  in  detail 
elsewhere  (see  Stroyan  and  Mathforum.org  in  the  references,  p.  201). 
I’ll  just  sketch  the  general  idea.  A  hyperreal  number  is  represented  by 
an  infinite  sequence  of  real  numbers.  For  example,  the  sequence 

7,  7,7,7,... 

would  be  the  hyperreal  version  of  the  number  7.  A  sequence  like 

1,2,3,... 

represents  an  infinite  number,  while 

2’  3’  "  ‘ 

is  infinitesimal.  All  the  arithmetic  operations  are  defined  by  applying 
them  to  the  corresponding  members  of  the  sequences.  For  example,  the 
sum  of  the  7,  7,  7,  . . .  sequence  and  the  1,  2,  3,  . . .  sequence  would  be  8, 
9,  10,  . . . ,  which  we  interpret  as  a  somewhat  larger  infinite  number. 

The  big  problem  in  this  approach  is  how  to  compare  hyperreals,  because 
a  comparison  like  <  is  supposed  to  give  an  answer  that  is  either  true  or 
false.  It’s  not  supposed  to  give  a  hyperreal  number  as  the  result. 

It’s  clear  that  8,  9,  10,  ...is  greater  than  1,  1,  1,  ...,  because  every 
member  of  the  first  sequence  is  greater  than  every  member  of  the  sec¬ 
ond  one.  But  is  8,  9,  10,  ...  greater  than  9,  9,  9,  . . .  ?  We  want  the 
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answer  to  be  “yes,”  because  we’re  thinking  of  the  first  one  as  an  infinite 
number  and  the  second  one  as  the  ordinary  finite  number  9.  The  first 
sequence  is  indeed  greater  than  the  second  at  almost  every  one  of  the 
infinite  number  of  places  at  which  they  could  be  compared.  The  only 
place  where  it  loses  the  contest  is  at  the  very  first  position,  and  the 
only  spot  where  we  get  a  tie  is  the  second  one.  Essentially  the  idea  is 
that  we  want  to  define  a  concept  of  what  happens  “almost  everywhere” 
on  some  infinite  list.  If  one  thing  happens  in  an  infinite  number  of 
places  and  something  else  only  happens  at  some  finite  number  of  spots, 
then  the  definition  of  “almost  everywhere”  is  clear.  What’s  harder  is  a 
comparison  of  something  like  these  two  sequences: 

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,... 


and 


1,3, 1,1, 3,1, 1,1, 3, 1,1, 1,1, 3,... 


where  the  second  sequence  has  longer  and  longer  runs  of  ones  inter¬ 
spersed  between  the  threes.  The  two  sequences  are  never  equal  at  any 
position,  so  clearly  they  can’t  be  considered  to  be  equal  as  hyperreal 
numbers.  But  there  is  an  infinite  number  of  spots  in  which  the  first 
sequence  is  greater  than  the  second,  and  likewise  an  infinite  number  in 
which  it’s  less.  It  seems  as  though  there  are  more  in  which  it’s  greater, 
so  we  probably  want  to  define  the  second  sequence  as  being  a  hyperreal 
number  that’s  less  than  2.  The  problem  is  that  it  can  be  very  difficult  to 
write  down  an  acceptable  definition  of  this  “almost  everywhere”  notion. 
The  answer  is  very  technical,  and  I  won’t  go  into  it  here,  but  it  can  be 
done.  Because  two  sequences  could  be  equal  almost  everywhere,  we  end 
up  having  to  define  a  hyperreal  number  not  as  a  particular  sequence  but 
as  a  set  of  sequences  that  are  equal  to  each  other  almost  everywhere. 


With  the  construction  of  this  model,  it  is  possible  to  prove  that  the 
hyperreals  have  the  same  level  of  consistency  as  the  reals. 


The  transfer  principle  applied  to  functions 

On  page  34,  I  told  you  not  to  worry  about  whether  it  was  legitimate 
to  apply  familiar  functions  like  x 2,  yfx,  sinx,  cosx,  and  ex  to  hyperreal 
numbers.  But  since  you’re  reading  this,  you’re  obviously  in  need  of  more 
reassurance. 

For  some  of  these  functions,  the  transfer  principle  straightforwardly 
guarantees  that  they  work  for  hyperreals,  have  all  the  familiar  proper¬ 
ties,  and  can  be  computed  in  the  same  way.  For  example,  the  following 
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statement  is  in  a  suitable  form  to  have  the  transfer  principle  applied  to 
it:  For  any  real  number  x,  x  ■  x  >  0.  Changing  “real”  to  “hyperreal,” 
we  find  out  that  the  square  of  a  hyperreal  number  is  greater  than  or 
equal  to  zero,  just  like  the  square  of  a  real  number.  Writing  it  as  x2 
or  calling  it  a  square  is  just  a  matter  of  notation  and  terminology.  The 
same  applies  to  this  statement:  For  any  real  number  x  >  0,  there  exists 
a  real  number  y  such  that  y2  =  x.  Applying  the  transfer  function  to  it 
tells  us  that  square  roots  can  be  defined  for  the  hyperreals  as  well. 

There’s  a  problem,  however,  when  we  get  to  functions  like  sin  a:  and 
ex.  If  you  look  up  the  definition  of  the  sine  function  in  a  trigonometry 
textbook,  it  will  be  defined  geometrically,  as  the  ratio  of  the  lengths  of 
two  sides  of  a  certain  triangle.  The  transfer  principle  doesn’t  apply  to 
geometry,  only  to  arithmetic.  It’s  not  even  obvious  intuitively  that  it 
makes  sense  to  define  a  sine  function  on  the  hyperreals.  In  an  application 
like  the  differentiation  of  the  sine  function  on  page  28,  we  only  had  to 
take  sines  of  hyperreal  numbers  that  were  infinitesimally  close  to  real 
numbers,  but  if  the  sine  is  going  to  be  a  full-fledged  function  defined  on 
the  hyperreals,  then  we  should  be  allowed,  for  example,  to  take  the  sine 
of  an  infinite  number.  What  would  that  mean?  If  you  take  the  sine  of  a 
number  like  a  million  or  a  billion  on  your  calculator,  you  just  get  some 
apparently  random  result  between  —1  and  1.  The  sine  function  wiggles 
back  and  forth  indefinitely  as  x  gets  bigger  and  bigger,  never  settling 
down  to  any  specific  limiting  value.  Apparently  we  could  have  sin  H  =  1 
for  a  particular  infinite  Ft,  and  then  sin  (Ft  +  7r/2)  =  0,  sin(U  +  7r)  =  —1, 


It  turns  out  that  the  moral  equivalent  of  the  transfer  function  can  indeed 
be  applied  to  any  function  on  the  reals,  yielding  a  function  that  is  in 
some  sense  its  natural  “big  brother”  on  the  the  hyperreals,  but  the 
consequences  can  be  either  disturbing  or  exhilirating  depending  on  your 
tastes.  For  example,  consider  the  function  [a:]  that  takes  a  real  number 
x  and  rounds  it  down  to  the  greatest  integer  that  is  less  than  or  equal 
to  to  x,  e.g.,  [3]  =  3,  and  [7r]  =  3.  This  function,  like  any  other  real 
function,  can  be  extended  to  the  hyperreals,  and  that  means  that  we 
can  define  the  hyperintegers,  the  set  of  hyperreals  that  satisfy  [x\  =  x. 
The  hyperintegers  include  the  integers  as  a  subset,  but  they  also  include 
infinite  numbers.  This  is  likely  to  seem  magical,  or  even  unreasonable, 
if  we  come  at  the  hyperreals  from  a  purely  axiomatic  point  of  view.  The 
extension  of  functions  to  the  hyperreals  seems  much  more  natural  in 
view  of  the  construction  of  the  hyperreals  in  terms  of  sequences  given  in 
the  preceding  section.  For  example,  the  sequence  1.3, 2.3,  3.3, 4.3,  5.3, . . . 
represents  an  infinite  number.  If  we  apply  the  [x\  function  to  it,  we  get 
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1,  2,  3, 4,  5, . . which  is  an  infinite  integer. 


Proof  of  the  chain  rule 

In  the  statement  of  the  chain  rule  on  page  37,  I  followed  my  usual 
custom  of  writing  derivatives  as  dy/dx,  when  actually  the  derivative  is 
the  standard  part,  st(dy/ dx).  In  more  rigorous  notation,  the  chain  rule 
should  be  stated  like  this: 


, '  dz  \  /  dz  \  (  dy 

st  |  —  =  st  —  st  — 

ax  J  \  ay )  \  dx 


The  transfer  principle  allows  us  to  rewrite  the  left-hand  side  as 
st[(dz/ dy)(dy/ dx)],  and  then  we  can  get  the  desired  result  using  the 
identity  st(a&)  =  st(a)st(6). 


Derivative  of  ex 

All  of  the  reasoning  on  page  39  would  have  applied  equally  well  to  any 
other  exponential  function  with  a  different  base,  such  as  2X  or  10x. 
Those  functions  would  have  different  values  of  c,  so  if  we  want  to  deter¬ 
mine  the  value  of  c  for  the  base-e  case,  we  need  to  bring  in  the  definition 
of  e,  or  of  the  exponential  function  ex ,  somehow. 

We  can  take  the  definition  of  ex  to  be 

ex  =  lim  (l  +  -Y  . 
n— yoo  \  nJ 

The  idea  behind  this  relation  is  similar  to  the  idea  of  compound  interest. 
If  the  interest  rate  is  10%,  compounded  annually,  then  x  =  0.1,  and 
the  balance  grows  by  a  factor  (1  +  x)  =  1.1  in  one  year.  If,  instead, 
we  want  to  compound  the  interest  monthly,  we  can  set  the  monthly 
interest  rate  to  0.1/12,  and  then  the  growth  of  the  balance  over  a  year 
is  (l+x/12)12  =  1.1047,  which  is  slightly  larger  because  the  interest  from 
the  earlier  months  itself  accrues  interest  in  the  later  months.  Continuing 
this  limiting  process,  we  find  e1  1  =  1.1052. 

If  n  is  large,  then  we  have  a  good  approximation  to  the  base-e  ex¬ 
ponential,  so  let’s  differentiate  this  finite-n  approximation  and  try  to 
find  an  approximation  to  the  derivative  of  ex.  The  chain  rule  tells  is 
that  the  derivative  of  (1  +  x/n)n  is  the  derivative  of  the  raising-to- 
the-nth-power  function,  multiplied  by  the  derivative  of  the  inside  stuff, 
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cl(l  +  x/n)/ dx  =  1/n.  We  then  have 


dx 


1 

n 


-K)' 


But  evaluating  this  at  x  =  0  simply  gives  1,  so  at  x  =  0,  the  approxi¬ 
mation  to  the  derivative  is  exactly  1  for  all  values  of  n  —  it’s  not  even 
necessary  to  imagine  going  to  larger  and  larger  values  of  n.  This  estab¬ 
lishes  that  c  =  1,  so  we  have 

dex 

dx 

for  all  values  of  x. 


=  e 


Proofs  of  the  generalizations  of  I’Hopital’s  rule 

Multiple  applications  of  the  rule 

Here  we  prove,  as  claimed  on  p.  66,  that  the  form  of  L’Hopital’s  rule 
rule  given  on  p.  61  can  be  generalized  to  the  case  where  more  than 
one  application  of  the  rule  is  required.  The  proof  requires  material 
from  clr.  4  (integration  and  the  mean  value  theorem),  and,  as  discussed 
in  example  86  on  p.  112,  the  motivation  for  the  result  becomes  much 
more  transparent  once  has  read  ch.  7  and  knows  about  Taylor  series. 
The  reader  who  has  arrived  here  while  reading  ch.  3  will  need  to  defer 
reading  this  section  of  the  proof  until  after  ch.  4,  and  may  wish  to  wait 
until  after  ch.  7. 

The  proof  can  be  broken  down  into  two  steps. 

Step  1:  We  first  have  to  establish  a  stronger  form  of  l’Hopital’s  rule  that 
states  that  limrt/n  =  lim  u/v  rather  than  limu/u  =  u/v.  This  form  is 
stronger,  because  in  a  case  like  example  47  on  p.  66,  u/v  isn’t  defined, 
but  lim  u/v  is. 

We  prove  the  stronger  form  using  the  mean  value  theorem  (p.  76).  For 
simplicity  of  notation,  let’s  assume  that  the  limit  is  being  taken  at  x  =  0. 
By  the  fundamental  theorem  of  calculus,  we  have  u( x)  =  f0  u(x')dx', 
and  the  mean  value  theorem  then  tells  us  that  for  some  p  between  0  and 
x ,  u(x)  =  xii(p).  Likewise  for  a  q  in  this  interval,  v(x )  =  xv(q).  So 
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but  since  both  p  and  q  are  closer  to  zero  than  x  is,  the  limit  as  they 
simultaneously  approach  zero  is  the  same  as  the  limit  as  x  approaches 
zero. 


Step  2:  If  we  need  to  take  n  derivatives,  the  proof  follows  by  applying 
the  extra-strength  rule  n  times.3 

Change  of  variable 


We  will  build  up  the  rest  of  the  features  of  l’Hopital’s  rule  using  the 
technique  of  a  change  of  variable.  To  demonstrate  how  this  works,  let’s 
imagine  that  we  were  starting  from  an  even  more  stripped-down  version 
of  rHopital’s  rule  than  the  one  on  p.  61.  Say  we  only  knew  how  to  do 
limits  of  the  form  x  — >  0  rather  than  x  — >  a  for  an  arbitrary  real  number 
a.  We  could  then  evaluate  lirm^a  u/v  simply  by  defining  t  =  x  —  a  and 
reexpressing  u  and  v  in  terms  of  t. 

I 

>  Reduce 


to  a  form  involving  a  limit  at  0. 


Example  104 


sinx 


lim 

x^-Ti  X  —  71 


o  Define  t  =  x  —  n.  Solving  for  x  gives  x  =  t  +  n.  We  substitute  into  the  above 
expression  to  find 


lim 


sinx 


X->7I  X  —  7t 


lim 

f^O 


sin  (t  +  7t) 
t 


If  all  we  knew  was  the  -r-  0  form  of  I'Hopital’s  rule,  then  this  would  suffice  to 
reduce  the  problem  to  one  we  knew  how  to  solve.  In  fact,  this  kind  of  change  of 
variable  works  in  all  cases,  not  just  for  a  limit  at  7t,  so  rather  then  going  through 
a  laborious  change  of  variable  every  time,  we  could  simply  establish  the  more 
general  form  on  p.  61 ,  with  ->  a. 


The  indeterminate  form  oo /oo 

To  prove  that  PHopital’s  rule  works  in  general  for  oo/oo  forms,  we  do  a 
change  of  variable  on  the  outputs  of  the  functions  u  and  v  rather  than 

3  There  is  a  logical  subtlety  here,  which  is  that  although  we’ve  given  a  clearcut 
recipe  for  cooking  up  a  proof  for  any  given  n,  that  isn’t  quite  the  same  thing  as 
proving  it  for  any  positive  integer  n.  This  is  an  example  where  what  we  really  need 
is  a  technique  called  proof  by  induction.  In  general,  proof  by  induction  works  like 
this.  Suppose  we  prove  some  statement  about  the  integer  1,  e.g.,  that  l’Hopital’s 
rule  is  valid  when  you  take  1  derivative.  Now  say  that  we  can  also  prove  that  if  that 
statement  holds  for  a  given  n,  it  also  holds  for  n  +  1.  Proof  by  induction  means  that 
we  can  then  consider  the  statement  as  having  been  proved  for  all  positive  integers. 
For  suppose  the  contrary.  Then  there  would  be  some  least  n  for  which  it  failed,  but 
this  would  be  a  contradiction,  since  it  would  hold  for  n  —  1. 
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their  inputs.  Suppose  that  our  original  problem  is  of  the  form 


lim 


u 

V 


where  both  functions  blow  up.4  We  then  define  U  =  1/u  and  V  =  \/v. 
We  now  have 


lim  -  =  lim  Tyrj  =  lim7y> 
v  1/V  U 


and  since  U  and  V  both  approach  zero,  we  have  reduced  the  problem 
to  one  that  can  be  solved  using  the  version  of  l’Hopital’s  rule  already 
proved  for  indeterminate  forms  like  0/0.  Differentiating  and  applying 
the  chain  rule,  we  have 


u  V  — v  2v 

Inn  —  =  Inn  —  =  lim - . 

v  JJ  — u  Au 

Since  lima6  =  lim  a  lim  b  provided  that  lima  and  lim&  are  both  defined, 
we  can  rearrange  factors  to  produce  the  desired  result. 

This  change  of  variable  is  a  specific  example  of  a  much  more  general 
method  of  problem-solving  in  which  we  look  for  a  way  to  reduce  a  hard 
problem  to  an  easier  one.  We  will  encounter  changes  of  variable  again  on 
p.  87  as  a  technique  for  integration,  which  means  undoing  the  operation 
of  differentiation. 


Proof  of  the  fundamental  theorem  of  calculus 

There  are  three  parts  to  the  proof:  (1)  Take  the  equation  that  states 
the  fundamental  theorem,  differentiate  both  sides  with  respect  to  b,  and 
show  that  they’re  equal.  (2)  Show  that  continuous  functions  with  equal 
derivatives  must  be  essentially  the  same  function,  except  for  an  additive 
constant.  (3)  Show  that  the  constant  in  question  is  zero. 

1.  By  the  definition  of  the  indefinite  integral,  the  derivative  of  x(b)  —  x(a) 
with  respect  to  b  equals  x(b).  We  have  to  establish  that  this  equals  the 


4Think  about  what  happens  when  only  u  blows  up,  or  only  v. 
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following: 


_d 

d& 


*4 


pb+db 


4 


f*b-\-db 


x(t)  d t—f  x{t)  d t 

J  a 

x(t)  df 

H  lb 

st  — lim  V x(b  +  idb/H)(— 
r\h  H- von  v  H 


st  lim  -j-  x{b  +  idb/H) 


i= 0 

H 


//— fC3 O  H 


i= 0 


Since  x  is  continuous,  all  the  values  of  x  occurring  inside  the  sum  can 
differ  only  infinitesimally  from  x(b).  Therefore  the  quantity  inside  the 
limit  differs  only  infinitesimally  from  x(b),  and  the  standard  part  of  its 
limit  must  be  x(b).5 


2.  Suppose  /  and  g  are  two  continuous  functions  whose  derivatives  are 
equal.  Then  d  =  f  —  g  is  a  continuous  function  whose  derivative  is  zero. 
But  the  only  continuous  function  with  a  derivative  of  zero  is  a  constant, 
so  /  and  g  differ  by  at  most  an  additive  constant. 


3.  I’ve  established  that  the  derivatives  with  respect  to  b  of  x(b)  —  x(a) 
and  Jb  x  dt  are  the  same,  so  they  differ  by  at  most  an  additive  constant. 
But  at  b  =  a,  they’re  both  zero,  so  the  constant  must  be  zero. 


5  If  you  don’t  want  to  use  infinitesimals,  then  you  can  express  the  derivative  as  a 
limit,  and  in  the  final  step  of  the  argument  use  the  mean  value  theorem,  introduced 
later  in  the  chapter. 
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The  intermediate  value  theorem 

On  page  54  I  asserted  that  the  intermediate  value  theorem  was  really 
more  a  statement  about  the  (real  or  hyperreal)  number  system  than 
about  functions.  For  insight,  consider  figure  b,  which  is  a  geometrical 
construction  that  constitutes  the  proof  of  the  very  first  proposition  in 
Euclid’s  celebrated  Elements.  The  proposition  to  be  proved  is  that  given 
a  line  segment  AB,  it  is  possible  to  construct  an  equilateral  triangle  with 
AB  as  its  base.  The  proof  is  by  construction;  that  is,  Euclid  doesn’t 
just  give  a  logical  argument  that  convinces  us  the  triangle  must  exist, 
he  actually  demonstrates  how  to  construct  it.  First  we  draw  a  circle 
with  center  A  and  radius  AB,  which  his  third  postulate  says  we  can  do. 
Then  we  draw  another  circle  with  the  same  radius,  but  centered  at  B. 
Pick  one  of  the  intersections  of  the  circles  and  call  it  C.  Construct  the 
line  segments  AC  and  BC  (postulate  1).  Then  AC  equals  AB  by  the 
definition  of  the  circle,  and  likewise  BC  equals  AB.  Euclid  also  has  an 
axiom  that  things  equal  to  the  same  thing  are  equal  to  one  another,  so 
it  follows  that  AC  equals  BC,  and  therefore  the  triangle  is  equilateral. 


It  seems  like  a  model  of  mathematical  rigor,  but  there’s  a  flaw  in  the 
reasoning,  which  is  that  he  assumes  without  justififcation  that  the  cir¬ 
cles  do  have  a  point  in  common.  To  see  that  this  is  not  as  secure  an 
assumption  as  it  seems,  consider  the  usual  Cartesian  representation  of 
plane  geometry  in  terms  of  coordinates  ( x ,  y).  Usually  we  assume  that  x 
and  y  are  real  numbers.  What  if  we  instead  do  our  Cartesian  geometry 
using  rational  numbers  as  coordinates?  Euclid’s  five  postulates  are  all 
consistent  with  this.  For  example,  circles  do  exist.  Let  A  =  (0, 0)  and 
B  =  (1,  0).  Then  there  are  infinitely  many  pairs  of  rational  numbers  in 
the  set  that  satisfies  the  definition  of  the  circle  centered  at  A.  Examples 
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include  (3/5, 4/5)  and  (—7/25,24/25).  The  circle  is  also  continuous  in 
the  sense  that  if  I  specify  a  point  on  it  such  as  (—7/25,24/25),  and  a 
distance  that  I’m  allowed  to  make  as  small  as  I  please,  say  1CP6,  then 
other  points  exist  on  the  circle  within  that  distance  of  the  given  point. 
However,  the  intersection  assumed  by  Euclid’s  proof  doesn’t  exist.  It 
would  lie  at  (1/2,  v/3/2),  but  \/3  doesn’t  exist  in  the  rational  number 
system. 

In  exactly  the  same  way,  we  can  construct  counterexamples  to  the  in¬ 
termediate  value  theorem  if  the  underlying  system  of  numbers  doesn’t 
have  the  same  properties  as  the  real  numbers.  For  example,  let  y  =  x2. 
Then  y  is  a  continuous  function,  on  the  interval  from  0  to  1,  but  if 
we  take  the  rational  numbers  as  our  foundation,  then  there  is  no  x  for 
which  y  =  1/2.  The  solution  would  be  x  =  l/v^,  which  doesn’t  exist  in 
the  rational  number  system.  Notice  the  similarity  between  this  problem 
and  the  one  in  Euclid’s  proof.  In  both  cases  we  have  curves  that  cut 
one  another  without  having  an  intersection.  In  the  present  example,  the 
curves  are  the  graphs  of  the  functions  y  =  x2  and  y  =  1/2. 

The  interpretation  is  that  the  real  numbers  are  in  some  sense  more 
densely  packed  than  the  rationals,  and  with  two  thousand  years  worth  of 
hindsight,  we  can  see  that  Euclid  should  have  included  a  sixth  postulate 
that  expressed  this  density  property.  One  possible  way  of  stating  such 
a  postulate  is  the  following.  Let  L  be  a  ray,  and  O  its  endpoint.  We 
think  of  O  as  the  origin  of  the  positive  number  line.  Let  P  and  Q  be 
sets  of  points  on  L  such  that  every  point  in  P  is  closer  to  O  than  every 
point  in  Q.  Then  there  exists  some  point  Z  on  L  such  that  Z  lies  at 
least  as  far  from  O  as  every  point  in  P,  but  no  farther  than  any  point  in 
Q.  Technically  this  property  is  known  as  completeness.  As  an  example, 
let  P  =  {x\x2  <  2}  and  Q  =  {x\x2  >  2}.  Then  the  point  Z  would 
have  to  be  \/2 ,  which  shows  that  the  rationals  are  not  complete.  The 
reals  are  complete,  and  the  completeness  axiom  can  serve  as  one  of  the 
fundamental  axioms  of  the  real  numbers. 

Note  that  the  axiom  refers  to  sets  P  and  Q,  and  says  that  a  certain 
fact  is  true  for  any  choice  of  those  sets;  it  therefore  isn’t  the  type  of 
proposition  that  is  covered  by  the  transfer  principle,  and  in  fact  it  fails 
for  the  hyperreals,  as  we  can  see  if  P  is  the  set  of  all  infinitesimals  and 
Q  the  positive  real  numbers. 

Here  is  a  skeletal  proof  of  the  intermediate  value  theorem,  in  which  I’ll 
make  some  simplifying  assumptions  and  leave  out  some  cases.  We  want 
to  prove  that  if  y  is  a  continuous  real- valued  function  on  the  real  interval 
from  a  to  b ,  and  if  y  takes  on  values  y\  and  z/2  at  certain  points  within 
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this  interval,  then  for  any  2/3  between  z/i  and  1/2 ,  there  is  some  real  x  in 
the  interval  for  which  y(x)  =  y 3.  I’ll  assume  the  case  in  which  x\  <  X2 
and  y\  <  2/2-  Define  sets  of  real  numbers  P  =  {x\y  <  y 3},  and  let 
Q  =  {x\y  >  2/3}-  For  simplicity,  I’ll  assume  that  every  member  of  P  is 
less  than  or  equal  to  every  member  of  Q,  which  happens,  for  example, 
if  the  function  y(x)  is  always  increasing  on  the  interval  [a,  b}.  If  P  and 
Q  intersect,  then  the  theorem  holds.  Suppose  instead  that  P  and  Q  do 
not  intersect.  Using  the  completeness  axiom,  there  exists  some  real  x 
which  is  greater  than  or  equal  to  every  element  of  P  and  less  than  or 
equal  to  every  element  of  Q.  Suppose  x  belongs  to  P.  Then  the  following 
statement  is  in  the  right  form  for  the  transfer  principle  to  apply  to  it: 
for  any  number  x'  >  x ,  y(x')  >  2/3.  We  can  conclude  that  the  statement 
is  also  true  for  the  hyperreals,  so  that  if  da;  is  a  positive  infinitesimal  and 
x'  =  x  +  dx,  we  have  y(x)  <  2/3,  but  y(x  +  dx )  >  y 3.  Then  by  continuity, 
y{x)  —  y(x  +  da;)  is  infinitesimal.  But  y{x)  <  2/3  and  y{x  +  da;)  >  2/3,  so 
the  standard  part  of  y(x)  must  equal  2/3.  By  assumption  y  takes  on  real 
values  for  real  arguments,  so  y(x)  =  2/3.  The  same  reasoning  applies  if 
x  belongs  to  Q,  and  since  x  must  belong  either  to  P  or  to  Q,  the  result 
is  proved. 

For  an  alternative  proof  of  the  intermediate  value  theorem  by  an  entirely 
different  technique,  see  Keisler  (references,  p.  201). 

As  a  side  issue,  we  could  ask  whether  there  is  anything  like  the  interme¬ 
diate  value  theorem  that  can  be  applied  to  functions  on  the  hyperreals. 
Our  definition  of  continuity  on  page  53  explicitly  states  that  it  only 
applies  to  real  functions.  Even  if  we  could  apply  the  definition  to  a 
function  on  the  hyperreals,  the  proof  given  above  would  fail,  since  the 
hyperreals  lack  the  completeness  property.  As  a  counterexample,  let  e 
be  some  positive  infinitesimal,  and  define  a  function  y  such  that  y  =  —  e 
when  st(a’)  <  0  and  y  =  e  everywhere  else.  If  we  insist  on  applying 
the  definition  of  continuity  to  this  function,  it  appears  to  be  continuous, 
so  it  violates  the  intermediate  value  theorem.  Note,  however,  that  the 
way  this  function  is  defined  is  different  from  the  way  we  usually  define 
functions  on  the  hyperreals.  Usually  we  define  a  function  on  the  reals, 
say  y  =  x2 ,  in  language  to  which  the  transfer  principle  applies,  and  then 
we  use  the  transfer  principle  to  reason  about  the  function’s  analog  on 
the  hyperreals.  For  instance,  the  function  y  =  x2  has  the  property  that 
y  >  0  everywhere,  and  the  transfer  principle  guarantees  that  that’s  also 
true  if  we  take  y  =  x2  as  the  definition  of  a  function  on  the  hyperreals. 
For  functions  defined  in  this  way,  the  intermediate  value  theorem  makes 
a  statement  that  the  transfer  principle  applies  to,  and  it  is  therefore 
true  for  the  hyperreal  version  of  the  function  as  well. 
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Proof  of  the  extreme  value  theorem 

The  extreme  value  theorem  was  stated  on  page  56.  Before  we  can  prove 
it,  we  need  to  establish  some  preliminaries,  which  turn  out  to  be  inter¬ 
esting  for  their  own  sake. 

Definition:  Let  C  be  a  subset  of  the  real  numbers  whose  definition  can 
be  expressed  in  the  type  of  language  to  which  the  transfer  principle 
applies.  Then  C  is  compact  if  for  every  hyperreal  number  x  satisfying 
the  definition  of  C,  the  standard  part  of  x  exists  and  is  a  member  of  C. 

To  understand  the  content  of  this  definition,  we  need  to  look  at  the  two 
ways  in  which  a  set  could  fail  to  satisfy  it. 

First,  suppose  U  is  defined  by  x  >  0.  Then  there  are  positive  infinite 
hyperreal  numbers  that  satisfy  the  definition,  and  their  standard  part  is 
not  defined,  so  U  is  not  compact.  The  reason  U  is  not  compact  is  that 
it  is  unbounded. 

Second,  let  V  be  defined  by  0  <  x  <  1.  Then  if  dx  is  a  positive  infinites¬ 
imal,  1  —  dx  satisfies  the  definition  of  V,  but  its  standard  part  is  1,  which 
is  not  in  V,  so  V  is  not  compact.  The  set  V  has  boundary  points  at 
0  and  1,  and  the  reason  it  is  not  compact  is  that  it  doesn’t  contain  its 
right-hand  boundary  point.  A  boundary  point  is  a  real  number  which 
is  infinitesimally  close  to  some  points  inside  the  set,  and  also  to  some 
other  points  that  are  on  the  outside. 

We  therefore  arrive  at  the  following  alternative  characterization  of  the 
notion  of  a  compact  set,  whose  proof  is  straightforward. 

Theorem:  A  set  is  compact  if  and  only  if  it  is  bounded  and  contains  all 
of  its  boundary  points. 

Intuitively,  the  reason  compact  sets  are  interesting  is  that  if  you’re  stand¬ 
ing  inside  a  compact  set  and  start  taking  steps  in  a  certain  direction, 
without  ever  turning  around,  you’re  guaranteed  to  approach  some  point 
in  the  set  as  a  limit.  (You  might  step  over  some  gaps  that  aren’t  in¬ 
cluded  in  the  set.)  If  the  set  was  unbounded,  you  could  just  walk  forever 
at  a  constant  speed.  If  the  set  didn’t  contain  its  boundary  point,  then 
you  could  asymptotically  approach  the  boundary,  but  the  goal  you  were 
approaching  wouldn’t  be  a  member  of  the  set. 

The  following  theorem  turns  out  to  be  the  most  difficult  part  of  the 
discussion. 

Theorem:  A  compact  set  contains  its  maximum  and  minimum. 

Proof:  Let  C  be  a  compact  set.  We  know  it’s  bounded,  so  let  M  be  the 
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set  of  all  real  numbers  that  are  greater  than  any  member  of  C.  By  the 
completeness  property  of  the  real  numbers,  there  is  some  real  number  x 
between  C  and  M.  Let  *C  be  the  set  of  hyperreal  numbers  that  satisfies 
the  same  definition  that  C  does. 

Every  real  x'  greater  than  x  fails  to  satisfy  the  condition  that  defines 
C,  and  by  the  transfer  principle  the  same  must  be  true  if  x'  is  any 
hyperreal,  so  if  dx  is  a  positive  infinitesimal,  x  +  dx  must  be  outside  of 
*C. 

But  now  consider  x  —  dx.  The  following  statement  holds  for  the  reals: 
there  is  no  number  x'  <  x  that  is  greater  than  every  member  of  C .  By 
the  transfer  principle,  we  find  that  there  is  some  hyperreal  number  q 
in  *C  that  is  greater  than  x  —  dx.  But  the  standard  part  of  q  must 
equal  x,  for  otherwise  stq  would  be  a  member  of  C  that  was  greater 
than  x.  Therefore  x  is  a  boundary  point  of  C,  and  since  C  is  compact, 
a:  is  a  member  of  C .  We  conclude  C  contains  its  maximum.  A  similar 
argument  shows  that  C  contains  its  minimum,  so  the  theorem  is  proved. 

There  were  two  subtle  things  about  this  proof.  The  first  was  that  we 
ended  up  constructing  the  set  of  hyperreals  *C,  which  was  the  hyperreal 
“big  brother”  of  the  real  set  C.  This  is  exactly  the  sort  of  thing  that  the 
transfer  principle  does  not  guarantee  we  can  do.  However,  if  you  look 
back  through  the  proof,  you  can  see  that  *C  is  used  only  as  a  notational 
convenience.  Rather  than  talking  about  whether  a  certain  number  was  a 
member  of  *C,  we  could  have  referred,  more  cumbersomely,  to  whether 
or  not  it  satisfied  the  condition  that  had  originally  been  used  to  define 
C.  The  price  we  paid  for  this  was  a  slight  loss  of  generality.  There 
are  so  many  different  sets  of  real  numbers  that  they  can’t  possibly  all 
have  explicit  definitions  that  can  be  written  down  on  a  piece  of  paper. 
However,  there  is  very  little  reason  to  be  interested  in  studying  the 
properties  of  a  set  that  we  were  never  able  to  define  in  the  first  place. 
The  other  subtlety  was  that  we  had  to  construct  the  auxiliary  point 
x  —  dx,  but  there  was  not  much  we  could  actually  say  about  x  —  dx 
itself.  In  particular,  it  might  or  might  not  have  been  a  member  of  C . 
For  example,  if  C  is  defined  by  the  condition  x  =  0,  then  *C  likewise 
contains  only  the  single  element  0,  and  x  —  dx  is  not  a  member  of  *C. 
But  if  C  is  defined  by  0  <  x  <  1,  then  x  —  dx  is  a  member  of  *C. 

The  original  goal  was  to  prove  the  extreme  value  theorem,  which  is  a 
statement  about  continuous  functions,  but  so  far  we  haven’t  said  any¬ 
thing  about  functions. 

Lemma:  Let  /  be  a  real  function  defined  on  a  set  of  points  C.  Let  D  be 
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the  image  of  C,  i.e.,  the  set  of  all  values  f(x)  that  occur  for  some  x  in 
C.  Then  if  /  is  continous  and  C  is  compact,  D  is  compact  as  well.  In 
other  words,  continuous  functions  take  compact  sets  to  compact  sets. 
Proof:  Let  y  =  f(x)  be  any  hyperreal  output  corresponding  to  a  hy- 
perreal  input  x  in  *C.  We  need  to  prove  that  the  standard  part  of  y 
exists,  and  is  a  member  of  D.  Since  C  is  compact,  the  standard  part 
of  x  exists  and  is  a  member  of  C.  But  then  by  continuity  y  differs  only 
infinitesimally  from  f(stx),  which  is  real,  so  sty  =  f(stx)  is  defined  and 
is  a  member  of  D. 


We  are  now  ready  to  prove  the  extreme  value  theorem,  in  a  version 
slightly  more  general  than  the  one  originally  given  on  page  56. 


The  extreme  value  theorem:  Any  continuous  function  on  a  compact  set 
achieves  a  maximum  and  minimum  value,  and  does  so  at  specific  points 
in  the  set. 


Proof:  Let  /  be  continuous,  and  let  C  be  the  compact  set  on  which 
we  seek  its  maximum  and  minimum.  Then  the  image  D  as  defined  in 
the  lemma  above  is  compact.  Therefore  D  contains  its  maximum  and 
minimum  values. 


Proof  of  the  mean  value  theorem 


Suppose  that  the  mean  value  theorem  is  violated.  Let  L  be  the  set  of  all 
x  in  the  interval  from  a  to  b  such  that  y(x)  <  y,  and  likewise  let  M  be 
the  set  with  y(x)  >  y.  If  the  theorem  is  violated,  then  the  union  of  these 
two  sets  covers  the  entire  interval  from  a  to  b.  Neither  one  can  be  empty; 
if,  for  example,  M  was  empty,  then  we  would  have  y  <  y  everywhere 
and  also  f ^  y  =  J ^  y,  but  it  follows  directly  from  the  definition  of  the 
definite  integral  that  when  one  function  is  less  than  another,  its  integral 
is  also  less  than  the  other’s.  Since  y  takes  on  values  less  than  and  greater 
than  y.  it  follows  from  the  intermediate  value  theorem  that  y  takes  on 
the  value  y  somewhere  (intuitively,  at  a  boundary  between  L  and  M). 
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Proof  of  the  fundamental  theorem  of  algebra 

We  start  with  the  following  lemma,  which  is  intuitively  obvious,  because 
polynomials  don’t  have  asymptotes.  Its  proof  is  given  after  the  proof  of 
the  main  theorem. 

Lemma:  For  any  polynomial  P(z)  in  the  complex  plane,  its  magnitude 
|P(z)|  achieves  its  minimum  value  at  some  specific  point  zQ. 

The  fundamental  theorem  of  algebra:  In  the  complex  number  system,  a 
nonzero  nth-order  polynomial  has  exactly  n  roots,  i.e. ,  it  can  be  factored 
into  the  form  P(z)  =  (z—ai)(z—a2)  ■  ■  ■  (z—an),  where  the  a,;  are  complex 
numbers. 

Proof:  The  proofs  in  the  cases  of  n  =  0  and  1  are  trivial,  so  our  strategy 
is  to  reduce  higher-n  cases  to  lower  ones.  If  an  nth-degree  polynomial  P 
has  at  least  one  root,  a,  then  we  can  always  reduce  it  to  a  polynomial  of 
degree  n  —  1  by  dividing  it  by  (z  —  a).  Therefore  the  theorem  is  proved 
by  induction  provided  that  we  can  show  that  every  polynomial  of  degree 
greater  than  zero  has  at  least  one  root. 

Suppose,  on  the  contrary,  that  there  is  an  nth  order  polynomial  P(z), 
with  n  >  0,  that  has  no  roots  at  all.  Then  by  the  lemma  \P\  achieves 
its  minimum  value  at  some  point  za.  To  make  things  more  simple  and 
concrete,  we  can  construct  another  polynomial  Q(z)  =  P(z  +  z0)/P(z0), 
so  that  \Q\  has  a  minimum  value  of  1,  achieved  at  Q(0)  =  1.  This  means 
that  Q’s  constant  term  is  1.  What  about  its  other  terms?  Let  Q{z)  =  1+ 
Ciz+. .  .  +  cnzn.  Suppose  Ci  was  nonzero.  Then  for  infinitesimally  small 
values  of  z,  the  terms  of  order  z1  and  higher  would  be  negligible,  and 
we  could  make  Q{z)  be  a  real  number  less  than  one  by  an  appropriate 
choice  of  z' s  argument.  Therefore  Ci  must  be  zero.  But  that  means  that 
if  C2  is  nonzero,  then  for  infinitesimally  small  z,  the  z2  term  dominates 
the  zz  and  higher  terms,  and  again  this  would  allow  us  to  make  Q(z)  be 
real  and  less  than  one  for  appropriately  chosen  values  of  z.  Continuing 
this  process,  we  find  that  Q(z)  has  no  terms  at  all  beyond  the  constant 
term,  i.e.,  Q(z)  =  1.  This  contradicts  the  assumption  that  n  was  greater 
than  zero,  so  we’ve  proved  by  contradiction  that  there  is  no  P  with  the 
properties  claimed. 

Uninteresting  proof  of  the  lemma:  Let  M(r)  be  the  minimum  value  of 
\P(z) |  on  the  disk  defined  by  \z\  <  r.  We  first  prove  that  M(r)  can’t 
asymptotically  approach  a  minimum  as  r  approaches  infinity.  Suppose 
to  the  contrary:  for  every  r,  there  is  some  r1  >  r  with  M(r')  <  M(r). 
Then  by  the  transfer  principle,  the  same  would  have  to  be  true  for 
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hyperreal  values  of  r.  But  it’s  clear  that  if  r  is  infinite,  the  lower-order 
terms  of  P  will  be  infinitesimally  small  compared  to  the  highest-order 
term,  and  therefore  M(r)  is  infinite  for  infinite  values  of  r,  which  is 
a  contradiction,  since  by  construction  M  is  decreasing,  and  finite  for 
finite  r.  We  can  therefore  conclude  by  the  extreme  value  theorem  that 
M  achieves  its  minimum  for  some  specific  value  of  r.  The  least  such  r 
describes  a  circle  \z\  =  r  in  the  complex  plane,  and  the  minimum  of  |P| 
on  this  circle  must  be  the  same  as  its  global  minimum.  Applying  the 
extreme  value  function  to  |P(z)|  as  a  function  of  arg  z  on  the  interval 
0  <  argz  <  27 r,  we  establish  the  desired  result. 
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B  Answers  and  solutions 


Answers  to  Self-Checks 

Answers  to  self-checks  for  chapter  4 

page  80,  self-check  1: 

The  area  under  the  curve  from  130  to  135  cm  is  about  3/4  of  a  rectangle. 
The  area  from  135  to  140  cm  is  about  1.5  rectangles.  The  number  of  peo¬ 
ple  in  the  second  range  is  about  twice  as  much.  We  could  have  converted 
these  to  actual  probabilities  (1  rectangle  =  5  cmx  0.005  cm-1  =  0.025), 
but  that  would  have  been  pointless,  because  we  were  just  going  to  com¬ 
pare  the  two  areas. 

Answers  to  self-checks  for  chapter  6 

page  120,  self-check  1:  Say  we’re  looking  for  u  =  ^/z,  i.e.,  we  want  a 
number  u  that,  multiplied  by  itself,  equals  2.  Multiplication  multiplies 
the  magnitudes,  so  the  magnitude  of  u  can  be  found  by  taking  the  square 
root  of  the  magnitude  of  2.  Since  multiplication  also  adds  the  arguments 
of  the  numbers,  squaring  a  number  doubles  its  argument.  Therefore  we 
can  simply  divide  the  argument  of  2  by  two  to  find  the  argument  of 
u.  This  results  in  one  of  the  square  roots  of  2.  There  is  another  one, 
which  is  —  u,  since  (— u)2  is  the  same  as  u2 .  This  may  seem  a  little  odd: 
if  u  was  chosen  so  that  doubling  its  argument  gave  the  argument  of  2, 
then  how  can  the  same  be  true  for  —  ul  Well  for  example,  suppose  the 
argument  of  2  is  4°.  Then  argw  =  2°,  and  arg(— u)  =  182°.  Doubling 
182  gives  364,  which  is  actually  a  synonym  for  4  degrees. 
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B  Answers  and  solutions 


Solutions  to  homework  problems 


Solutions  for  chapter  1 


page  21,  problem  1: 


The  tangent  line  has  to  pass  through  the  point  (3,9),  and  it  also  seems, 
at  least  approximately,  to  pass  through  (1.5,0).  This  gives  it  a  slope  of 
(9  —  0) / (3  —  1.5)  =  9/1.5  =  6,  and  that’s  exactly  what  2 1  is  at  t  =  3. 


x 


a  /  Problem  1 . 


page  21,  problem  2: 


The  tangent  line  has  to  pass  through  the  point  (0,sin(e°))  =  (0,0.84), 
and  it  also  seems,  at  least  approximately,  to  pass  through  (-1.6,0).  This 
gives  it  a  slope  of  (0.84  —  0)/(0  —  (—1.6))  =  0.84/1.6  =  0.53.  The  more 
accurate  result  given  in  the  problem  can  be  found  using  the  methods  of 
chapter  2. 
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x 


b  /  Problem  2. 


page  21,  problem  3: 

The  derivative  is  a  rate  of  change,  so  the  derivatives  of  the  constants 
1  and  7,  which  don’t  change,  are  clearly  zero.  The  derivative  can  be 
interpreted  geometrically  as  the  slope  of  the  tangent  line,  and  since  the 
functions  t  and  7 1  are  lines,  their  derivatives  are  simply  their  slopes,  1, 
and  7.  All  of  these  could  also  have  been  found  using  the  formula  that 
says  the  derivative  of  tk  is  fcffc_1,  but  it  wasn’t  really  necessary  to  get 
that  fancy.  To  find  the  derivative  of  t2,  we  can  use  the  formula,  which 
gives  2 1.  One  of  the  properties  of  the  derivative  is  that  multiplying  a 
function  by  a  constant  multiplies  its  derivative  by  the  same  constant,  so 
the  derivative  of  7t2  must  be  (7)  (2 1)  =  14f.  By  similar  reasoning,  the 
derivatives  of  t3  and  7 1.3  are  3f2  and  21 12,  respectively. 

page  21,  problem  4: 

One  of  the  properties  of  the  derivative  is  that  the  derivative  of  a  sum  is 
the  sum  of  the  derivatives,  so  we  can  get  this  by  adding  up  the  derivatives 
of  3 17 ,  —4 12,  and  6.  The  derivatives  of  the  three  terms  are  21 16,  —8t, 
and  0,  so  the  derivative  of  the  whole  thing  is  21f6  —  8 1. 

page  21,  problem  5: 

This  is  exactly  like  problem  4,  except  that  instead  of  explicit  numerical 
constants  like  3  and  —4,  this  problem  involves  symbolic  constants  a ,  b, 
and  c.  The  result  is  2 at  +  b. 

page  21,  problem  6: 

The  first  thing  that  comes  to  mind  is  3 1.  Its  graph  would  be  a  line  with 
a  slope  of  3,  passing  through  the  origin.  Any  other  line  with  a  slope  of 
3  would  work  too,  e.g.,  3f  +  1. 
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page  21,  problem  7: 

Differentiation  lowers  the  power  of  a  monomial  by  one,  so  to  get  some¬ 
thing  with  an  exponent  of  7,  we  need  to  differentiate  something  with  an 
exponent  of  8.  The  derivative  of  t8  would  be  8f7,  which  is  eight  times 
too  big,  so  we  really  need  (f8/8).  As  in  problem  6,  any  other  function 
that  differed  by  an  additive  constant  would  also  work,  e.g.,  (t8  / 8)  +  1. 

page  21,  problem  8: 

This  is  just  like  problem  7,  but  we  need  something  whose  derivative 
is  three  times  bigger.  Since  multiplying  by  a  constant  multiplies  the 
derivative  by  the  same  constant,  the  way  to  accomplish  this  is  to  take 
the  answer  to  problem  7,  and  multiply  by  three.  A  possible  answer  is 
(3/8 )t8,  or  that  function  plus  any  constant. 

page  21,  problem  9: 

This  is  just  a  slight  generalization  of  problem  8.  Since  the  derivative 
of  a  sum  is  the  sum  of  the  derivatives,  we  just  need  to  handle  each 
term  individually,  and  then  add  up  the  results.  The  answer  is  (3/8)t8  — 
(4/3)<3  +  6 1,  or  that  function  plus  any  constant. 

page  21,  problem  10: 

The  function  v  =  (4/3)7r(cf)3  looks  scary  and  complicated,  but  it’s 
nothing  more  than  a  constant  multiplied  by  f3,  if  we  rewrite  it  as  v  = 
[(4/3)7tc3]  t3.  The  whole  thing  in  square  brackets  is  simply  one  big 
constant,  which  just  comes  along  for  the  ride  when  we  differentiate. 
The  result  is  v  =  [(4/3)7tc3]  (3 12),  or,  simplifying,  v  =  (47tc3)  f2.  (For 
further  physical  insight,  we  can  factor  this  as  [47r(ct)2]  c,  where  ct  is  the 
radius  of  the  expanding  sphere,  and  the  part  in  brackets  is  the  sphere’s 
surface  area.) 

For  purposes  of  checking  the  units,  we  can  ignore  the  unit¬ 
less  constant  47r,  which  just  leaves  c3t2.  This  has  units  of 
(meters  per  second)3 (seconds)2,  which  works  out  to  be  cubic  meters  per 
second.  That  makes  sense,  because  it  tells  us  how  quickly  a  volume  is 
increasing  over  time. 

page  21,  problem  11: 

This  is  similar  to  problem  10,  in  that  it  looks  scary,  but  we  can  rewrite 
it  as  a  simple  monomial,  K  =  (1/2 )mv2  =  (1/2 )m{at)2  =  (ma2/2)t2. 
The  derivative  is  (ma2 /2)(2t)  =  ma2t.  The  car  needs  more  and  more 
power  to  accelerate  as  its  speed  increases. 
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To  check  the  units,  we  just  need  to  show  that  the  expression  ma2t  has 
units  that  are  like  those  of  the  original  expression  for  K ,  but  divided 
by  seconds,  since  it’s  a  rate  of  change  of  K  over  time.  This  indeed 
works  out,  since  the  only  change  in  the  factors  that  aren’t  unitless  is 
the  reduction  of  the  powet  of  t  from  2  to  1. 

page  22,  problem  12: 

The  area  is  a  =  £2  =  (1+aT)2^.  To  make  this  into  something  we  know 
how  to  differentiate,  we  need  to  square  out  the  expression  involving  T, 
and  make  it  into  something  that  is  expressed  explicitly  as  a  polynomial: 

a  =  e2a  +  2  i\aT  +  fQa2T2 

Now  this  is  just  like  problem  5,  except  that  the  constants  superficially 
look  more  complicated.  The  result  is 


a  =  2  l\a  +  2  fQa2T 
=  2 (a  +  a2T)  . 


We  expect  the  units  of  the  result  to  be  area  per  unit  temperature,  e.g., 
degrees  per  square  meter.  This  is  a  little  tricky,  because  we  have  to 
figure  out  what  units  are  implied  for  the  constant  a.  Since  the  question 
talks  about  1  +  aT,  apparently  the  quantity  aT  is  unitless.  (The  1  is 
unitless,  and  you  can’t  add  things  that  have  different  units.)  Therefore 
the  units  of  a  must  be  “per  degree,”  or  inverse  degrees.  It  wouldn’t 
make  sense  to  add  a  and  a2T  unless  they  had  the  same  units  (and 
you  can  check  for  yourself  that  they  do),  so  the  whole  thing  inside  the 
parentheses  must  have  units  of  inverse  degrees.  Multiplying  by  the  l2 
in  front,  we  have  units  of  area  per  degree,  which  is  what  we  expected. 

page  22,  problem  13: 

The  first  derivative  is  6 12  —  1.  Going  again,  the  answer  is  12t. 

page  22,  problem  14: 

The  first  derivative  is  3t2+2t,  and  the  second  is  6t+2.  Setting  this  equal 
to  zero  and  solving  for  t,  we  find  t  =  —1/3.  Looking  at  the  graph,  it 
does  look  like  the  concavity  is  down  for  t  <  —1/3,  and  up  for  t  >  —1/3. 


page  22,  problem  15: 

I  chose  k  =  —  1,  and  t  =  1.  In  other  words,  I’m  going  to  check  the  slope 
of  the  function  x  =  t^1  =  1/r  at  t  =  1,  and  see  whether  it  really  equals 
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X 


c  /  Problem  14. 

kfk-1  =  _ i.  Before  even  doing  the  graph,  I  note  that  the  sign  makes 
sense:  the  function  1/f  is  decreasing  for  t  >  0,  so  its  slope  should  indeed 
be  negative. 


x 


d  /  Problem  15. 


The  tangent  line  seems  to  connect  the  points  (0,2)  and  (2,0),  so  its  slope 
does  indeed  look  like  it’s  —1. 

The  problem  asked  us  to  consider  the  logical  meaning  of  the  two  pos¬ 
sible  outcomes.  If  the  slope  had  been  significantly  different  from  —  1 
given  the  accuracy  of  our  result,  the  conclusion  would  have  been  that  it 
was  incorrect  to  extend  the  rule  to  negative  values  of  k.  Although  our 
example  did  come  out  consistent  with  the  rule,  that  doesn’t  prove  the 
rule  in  general.  An  example  can  disprove  a  conjecture,  but  can’t  prove 
it.  Of  course,  if  we  tried  lots  and  lots  of  examples,  and  they  all  worked, 
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our  confidence  in  the  conjecture  would  be  increased. 

page  22,  problem  16: 

A  minimum  would  occur  where  the  derivative  was  zero.  First  we  rewrite 
the  function  in  a  form  that  we  know  how  to  differentiate: 

E{r)  =  kaV2r~12  -  2 ka6r~6 

We’re  told  to  have  faith  that  the  derivative  of  tk  is  ktk~l  even  for  k  <  0, 
so 


0  =  E 

=  —12  ka12r~13  +  12  ka6r~7 

To  simplify,  we  divide  both  sides  by  12 k.  The  left  side  was  already  zero, 
so  it  keeps  being  zero. 


0  =  —a12?’-13  +  a6r"7 
a12r~ 13  =  a6r~7 
a12  =  a6r6 
a6  =  r6 


r  =  ±a 


To  check  that  this  is  a  minimum,  not  a  maximum  or  a  point  of  inflection, 
one  method  is  to  construct  a  graph.  The  constants  a  and  k  are  irrelevant 
to  this  issue.  Changing  a  just  rescales  the  horizontal  r  axis,  and  changing 
k  does  the  same  for  the  vertical  E  axis.  That  means  we  can  arbitrarily 
set  a  =  1  and  k  =  1,  and  construct  the  graph  shown  in  the  figure.  The 
points  r  =  ±a  are  now  simply  r  =  ±1.  From  the  graph,  we  can  see 
that  they’re  clearly  minima.  Physically,  the  minimum  at  r  =  —a  can 
be  interpreted  as  the  same  physical  configuration  of  the  molecule,  but 
with  the  positions  of  the  atoms  reversed.  It  makes  sense  that  r  =  —a 
behaves  the  same  as  r  =  a,  since  physically  the  behavior  of  the  system 
has  to  be  symmetric,  regardless  of  whether  we  view  it  from  in  front  or 
from  behind. 


The  other  method  of  checking  that  r  =  a  is  a  minimum  is  to  take  the 
second  derivative.  As  before,  the  values  of  a  and  k  are  irrelevant,  and 
can  be  set  to  1.  We  then  have 

E  =  — 12r-13  +  12r-7 

E  =  156r-14  -  84r-8. 
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Answers  and  solutions 


E 


e  /  Problem  16. 

Plugging  in  r  =  ±1,  we  get  a  positive  result,  which  confirms  that  the 
concavity  is  upward. 

page  22,  problem  17: 

Since  polynomials  don’t  have  kinks  or  endpoints  in  their  graphs,  the 
maxima  and  minima  must  be  points  where  the  derivative  is  zero.  Dif¬ 
ferentiation  bumps  down  all  the  powers  of  a  polynomial  by  one,  so  the 
derivative  of  a  third-order  polynomial  is  a  second-order  polynomial.  A 
second-order  polynomial  can  have  at  most  two  real  roots  (values  of  t  for 
which  it  equals  zero) ,  which  are  given  by  the  quadratic  formula.  (If  the 
number  inside  the  square  root  in  the  quadratic  formula  is  zero  or  nega¬ 
tive,  there  could  be  less  than  two  real  roots.)  That  means  a  third-order 
polynomial  can  have  at  most  two  maxima  or  minima. 

page  22,  problem  18: 

Since  /,  g ,  and  s  are  smooth  and  defined  everywhere,  any  extrema  they 
possess  occur  at  places  where  their  derivatives  are  zero.  The  converse  is 
not  necessarily  true,  however;  a  place  where  the  derivative  is  zero  could 
be  a  point  of  inflection.  The  derivative  is  additive,  so  if  both  f  and  g 
have  zero  derivatives  at  a  certain  point,  s  does  as  well.  Therefore  in 
most  cases,  if  /  and  g  both  have  an  extremum  at  a  point,  so  will  s. 
However,  it  could  happen  that  this  is  only  a  point  of  inflection  for  s,  so 
in  general,  we  can’t  conclude  anything  about  the  extrema  of  s  simply 
from  knowing  where  the  extrema  of  /  and  g  occur. 

Going  the  other  direction,  we  certainly  can’t  infer  anything  about  ex¬ 
trema  of  /  and  g  from  knowledge  of  s  alone.  For  example,  if  s(a;)  =  x2, 
with  a  minimum  at  x  =  0,  that  tells  us  very  little  about  /  and  g.  We 
could  have,  for  example,  f(x)  =  (x— 1)2/2  —  2  and  g{x)  =  (x  +  l)2/2  +  l, 
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neither  of  which  has  an  extremum  at  x  =  0. 

page  22,  problem  19: 

Considering  V  as  a  function  of  h,  with  b  treated  as  a  constant,  we  have 
for  the  slope  of  its  graph 


V  = 


ey_ 
eh  ’ 


so 


ev  =  V  ■  eh 

=  \be. 


page  23,  problem  20: 

Thinking  of  the  rocket’s  height  as  a  function  of  time,  we  can  see  that 
goal  is  to  measure  the  function  at  its  maximum.  The  derivative  is  zero 
at  the  maximum,  so  the  error  incurred  due  to  timing  is  approximately 
zero.  She  should  not  worry  about  the  timing  error  too  much.  Other 
factors  are  likely  to  be  more  important,  e.g.,  the  rocket  may  not  rise 
exactly  vertically  above  the  launchpad. 

page  23,  problem  21:  If  x  =  n2 ,  and  a;  is  a  polynomial  in  n,  then 
we  must  have  x(n)  =  x{n)  —  x(n  —  1)  =  n2.  If  a;  is  a  polynomial  of 
order  k,  then  x(n)  and  x(n  —  1)  both  have  nk  terms  with  coefficients 
of  1,  so  i  has  no  nk  term.  We  want  x  to  have  a  nonvanishing  n2 
term,  so  we  must  have  k  >  3.  For  k  >  3,  it’s  easy  to  show  that  the 
n 3  term  in  x(n)  —  x(n  —  1)  is  nonzero,  so  we  must  have  k  =  3.  Let 
x(n)  =  an3  +  bn 2  +  . . .,  where  a  is  the  coefficient  that  we  want  to  prove 
is  1/3,  and  . . .  represents  lower-order  terms.  By  the  binomial  theorem, 
we  have  x(n  —  1)  =  an3  —  3an2  +  bn2  +  . . .,  and  subtracting  this  from 
x(n)  gives  x(n)  =  3 an3  +  . . ..  Since  3 a  =  1,  we  have  a  =  1/3. 

Solutions  for  chapter  2 


page  47,  problem  1: 
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dx  (■ t  +  df)4  —  f4 

df  d  t 

_  4 13  d t  +  6 12  d t2  +  At  d f3  +  df4 
d  t 

=  At3  +  ..., 

where  . . .  indicates  infinitesimal  terms.  The  derivative  is  the  standard 
part  of  this,  which  is  4f3. 

page  47,  problem  2: 

dx  cos  (f  +  df)  —  cos  f 

df  cl  t 

The  identity  cos(a  +  /?)  =  cos  a  cos  /?  —  sin  a  sin  / 3  then  gives 

dx  cos  t  cos  df  —  sin  f  sin  df  —  cos  f 

df  df 

The  small-angle  approximations  cos  df  ss  1  and  sin  df  ss  df  result  in 

dx  —  sin  f  df 

df  df 

=  —  sinf. 


page  47,  problem  3: 

H  —  y/W^A 

1000  .032 

1000,000  0.0010 

1000, 000, 000  0.00032 

The  result  is  getting  smaller  and  smaller,  so  it  seems  reasonable  to  guess 
that  if  H  is  infinite,  the  expression  gives  an  infinitesimal  result. 

page  47,  problem  4: 


dx 

y/dx 

.1 

.32 

.001 

.032 

.00001 

.0032 

The  square  root  is  getting  smaller,  but  is  not  getting  smaller  as  fast  as 
the  number  itself.  In  proportion  to  the  original  number,  the  square  root 
is  actually  getting  bigger.  It  looks  like  \/dx  is  infinitesimal,  but  it’s  still 
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infinitely  big  compared  to  dx.  This  makes  sense,  because  Vdx  equals 
dx1/2.  we  already  knew  that  da;0,  which  equals  1,  was  infinitely  big 
compared  to  da:1 ,  which  equals  dx.  In  the  hierarchy  of  infinitesimals, 
da;1/2  fits  in  between  dx°  and  dx1. 

page  47,  problem  5: 

Statements  (a)-(d),  and  (f)-(g)  are  all  valid  for  the  hyperreals,  because 
they  meet  the  test  of  being  directly  translatable,  without  having  to 
interpret  the  meaning  of  things  like  particular  subsets  of  the  reals  in  the 
context  of  the  hyperreals. 

Statement  (e),  however,  refers  to  the  rational  numbers,  a  particular 
subset  of  the  reals,  and  that  means  that  it  can’t  be  mindlessly  translated 
into  a  statement  about  the  hyperreals,  unless  we  had  figured  out  a  way 
to  translate  the  set  of  rational  numbers  into  some  corresponding  subset 
of  the  hyperreal  numbers  like  the  hyperrationals!  This  is  not  the  type  of 
statement  that  the  transfer  principle  deals  with.  The  statement  is  not 
true  if  we  try  to  change  “real”  to  “hyperreal”  while  leaving  “rational” 
alone;  for  example,  it’s  not  true  that  there’s  a  rational  number  that  lies 
between  the  hyperreal  numbers  0  and  0  +  dx,  where  dx  is  infinitesimal. 

page  47,  problem  6:  If  Ri  is  finite  and  R2  infinite,  then  1  / R2  is 
infinitesimal,  1/7?  1  +  I/R2  differs  infinitesimally  from  1/Ri,  and  the 
combined  resistance  R  differs  infinitesimally  from  R\.  Physically,  the 
second  pipe  is  blocked  or  too  thin  to  carry  any  significant  flow,  so  it’s 
as  though  it  weren’t  present. 

If  R\  is  finite  and  R2  is  infinitesimal,  then  I/R2  is  infinite,  I/R1  +  I/R2 
is  also  infinite,  and  the  combined  resistance  R  is  infinitesimal.  It’s  so 
easy  for  water  to  flow  through  R2  that  R\  might  as  well  not  be  present. 
In  the  context  of  electrical  circuits  rather  than  water  pipes,  this  is  known 
as  a  short  circuit. 

page  48,  problem  7:  The  velocity  addition  is  only  interesting  if  the 
infinitesimal  velocities  u  and  v  are  comparable  to  one  another,  i.e.,  their 
ratio  is  finite.  Let’s  write  e  for  the  size  of  these  infinitesimals,  so  that 
both  u  and  v  can  be  written  as  e  multiplied  by  some  finite  number. 
Then  1  +  uv  differs  from  1  by  an  amount  that  is  on  the  order  of  e2, 
which  is  infinitesimally  small  compared  to  e.  The  same  then  holds  true 
for  1/(1  +  uv)  as  well.  The  result  of  velocity  addition  (u  +  v)/(l  +  uv) 
is  then  u  +  v  +  . . .,  where  . . .  represents  quantities  of  order  e3,  which 
are  amount  to  a  correction  that  is  infinitesimally  small  compared  to  the 
nonrelativistic  result  u  +  v. 

page  48,  problem  8:  This  would  be  a  horrible  problem  if  we  had  to 
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expand  this  as  a  polynomial  with  101  terms,  as  in  chapter  1!  But  now 
we  know  the  chain  rule,  so  it’s  easy.  The  derivative  is 

[100(2®  +  3)"]  [2], 

where  the  first  factor  in  brackets  is  the  derivative  of  the  function  on 
the  outside,  and  the  second  one  is  the  derivative  of  the  “inside  stuff.” 
Simplifying  a  little,  the  answer  is  200(2a:  +  3)". 

page  48,  problem  9: 

Applying  the  product  rule,  we  get 

(®  +  1)"(®  +  2)200  +  (®  +  l)100(a:  +  2)199. 

(The  chain  rule  was  also  required,  but  in  a  trivial  way  —  for  both  of 
the  factors,  the  derivative  of  the  “inside  stuff”  was  one.) 

page  48,  problem  10: 

The  derivative  of  e7x  is  e7x  ■  7,  where  the  first  factor  is  the  derivative  of 
the  outside  stuff  (the  derivative  of  a  base-e  exponential  is  just  the  same 
thing),  and  the  second  factor  is  the  derivative  of  the  inside  stuff.  This 
would  normally  be  written  as  7e7x. 

The  derivative  of  the  second  function  is  ee  ex ,  with  the  second  expo¬ 
nential  factor  coming  from  the  chain  rule. 

page  48,  problem  11: 

We  need  to  put  together  three  different  ideas  here:  (1)  When  a  function 
to  be  differentiated  is  multiplied  by  a  constant,  the  constant  just  comes 
along  for  the  ride.  (2)  The  derivative  of  the  sine  is  the  cosine.  (3)  We 
need  to  use  the  chain  rule.  The  result  is  —abcos(bx  +  c). 

page  48,  problem  13: 

If  we  just  wanted  to  fine  the  integral  of  sin  x.  the  answer  would  be  —  cos  x 
(or  —  cos  x  plus  an  arbitrary  constant) ,  since  the  derivative  would  be 

—  (— sins),  which  would  take  us  back  to  the  original  function.  The 
obvious  thing  to  guess  for  the  integral  of  asin(6a:  +  c)  would  therefore 
be  —  a  cos(bx  +  c),  which  almost  works,  but  not  quite.  The  derivative  of 
this  function  would  be  absm(bx  +  c),  with  the  pesky  factor  of  b  coming 
from  the  chain  rule.  Therefore  what  we  really  wanted  was  the  function 

—  (a/6)  cos  (bx  +  c). 

page  48,  problem  14: 

The  chain  rule  gives 


^-((x2)2)2  =  2((x2)2)(2(x2))(2x)  =  8x7 
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which  is  the  same  as  the  result  we  would  have  gotten  by  differentiating 
x8. 

page  48,  problem  15: 

To  find  a  maximum,  we  take  the  derivative  and  set  it  equal  to  zero.  The 
whole  factor  of  2 v2/g  in  front  is  just  one  big  constant,  so  it  comes  along 
for  the  ride.  To  differentiate  the  factor  of  sin  9  cos  9,  we  need  to  use 
the  chain  rule,  plus  the  fact  that  the  derivative  of  sin  is  cos,  and  the 
derivative  of  cos  is  —  sin. 

2v2 

0  =  - (cos#  cos 9  +  sin#(—  sin#)) 

0  =  cos2  9  —  sin2  # 
cos  #  =  ±  sin  # 

We’re  interested  in  angles  between,  0  and  90  degrees,  for  which  both 
the  sine  and  the  cosine  are  positive,  so 

cos  9  =  sin  9 
tan  9=1 
9  =  45°. 

To  check  that  this  is  really  a  maximum,  not  a  minimum  or  an  inflection 
point,  we  could  resort  to  the  second  derivative  test,  but  we  know  the 
graph  of  R(9)  is  zero  at  9  =  0  and  9  =  90°,  and  positive  in  between,  so 
this  must  be  a  maximum. 

page  48,  problem  17: 

Taking  the  derivative  and  setting  it  equal  to  zero,  we  have 
(ex  —  e~x )  /2  =  0,  so  ex  =  e~x,  which  occurs  only  at  x  =  0.  The 
second  derivative  is  (ex  +  e~x )  /2  (the  same  as  the  original  function), 
which  is  positive  for  all  x,  so  the  function  is  everywhere  concave  up,  and 
this  is  a  minimum. 

page  49,  problem  18: 

There  are  no  kinks,  endpoints,  etc.,  so  extrema  will  occur  only  in  places 
where  the  derivative  is  zero.  Applying  the  chain  rule,  we  find  the  deriva¬ 
tive  to  be  cos(sin(sina;))  cos(sina;)  cos  a;.  This  will  be  zero  if  any  of  the 
three  factors  is  zero.  We  have  cosu  =  0  only  when  |u|  >  7t/2,  and  tt/2 
is  greater  than  1,  so  it’s  not  possible  for  either  of  the  first  two  factors 
to  equal  zero.  The  derivative  will  therefore  equal  zero  if  and  only  if 
cos  a:  =  0,  which  happens  in  the  same  places  where  the  derivative  of 
sin  a:  is  zero,  at  x  =  7r/2  +  i m,  where  n  is  an  integer. 
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y 


f  /  Problem  18. 

This  essentially  completes  the  required  demonstration,  but  there  is  one 
more  technical  issue,  which  is  that  it’s  conceivable  that  some  of  these 
could  be  points  of  inflection.  Constructing  a  graph  of  sin(sin(sin  x)) 
gives  us  the  necessary  insight  to  see  that  this  can’t  be  the  case.  The 
function  essentially  looks  like  the  sine  function,  but  its  extrema  have 
been  “shaved  down”  a  little,  giving  them  slightly  flatter  tips  that  don’t 
quite  extend  out  to  ±1.  It’s  therefore  fairly  clear  that  these  aren’t  points 
of  inflection.  To  prove  this  more  rigorously,  we  could  take  the  second 
derivative  and  show  that  it  was  nonzero  at  the  places  where  the  first 
derivative  is  zero.  That  would  be  messy.  A  less  tedious  argument  is 
as  follows.  We  can  tell  from  its  formula  that  the  function  is  periodic, 
i.e.,  it  has  the  property  that  /(x  +  £)  =  /(#),  for  £  =  2n.  This  follows 
because  the  innermost  sine  function  is  periodic,  and  the  outer  layers 
only  depend  on  the  result  of  the  inner  layer.  Therefore  all  the  points  of 
the  form  tt/2  +  2nn  have  the  same  behavior.  Either  they’re  all  maxima 
or  they’re  all  points  of  inflection.  But  clearly  a  function  can’t  oscillate 
back  and  forth  without  having  any  maxima  at  all,  so  they  must  all  be 
maxima.  A  similar  argument  applies  to  the  minima. 

page  49,  problem  19: 

The  function  /  has  a  kink  at  x  =  0,  so  it  has  no  uniquely  defined  tangent 
line  there,  and  its  derivative  at  that  point  is  undefined.  In  terms  of 
infinitesimals,  positive  values  of  dx  give  d//dx  =  (da:  +  dx)/ dx  =  2, 
while  negative  ones  give  d//dx  =  (— dx  +  dx)/ dx  =  0.  Since  the 
standard  part  of  the  quotient  dy/ dx  depends  on  the  specific  value  of 
dx,  the  derivative  is  undefined. 

The  function  g  has  no  kink  at  x  =  0.  The  graph  of  x|x|  looks  like  two 
half-parabolas  glued  together,  and  since  both  of  them  have  slopes  of  0 
at  x  =  0,  the  slope  of  the  tangent  line  is  well  defined,  and  is  zero.  In 
terms  of  infinitesimals,  cl g /  dy  is  the  standard  part  of  |  dx|  +  1,  which  is 
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1. 

page  49,  problem  20: 

(a)  As  suggested,  let  c  =  yJg/A,  so  that  d  =  A  In  cosh  ct  = 
A  ln(ect  +  e~ct).  Applying  the  chain  rule,  the  velocity  is 

,  cect  —  ce~ct 

A - , - • 

cosh  ct 

(b)  The  expression  can  be  rewritten  as  Ac  tanh  ct. 

(c)  For  large  t,  the  e~ct  terms  become  negligible,  so  the  velocity  is 
Acect  / ect  =  Ac.  (d)  From  the  original  expression,  A  must  have  units  of 
distance,  since  the  logarithm  is  unitless.  Also,  since  ct  occurs  inside  a 
function,  ct  must  be  unitless,  which  means  that  c  has  units  of  inverse 
time.  The  answers  to  parts  b  and  c  get  their  units  from  the  factors  of 
Ac,  which  have  units  of  distance  multiplied  by  inverse  time,  or  velocity. 

page  49,  problem  21: 

Since  I’ve  advocated  not  memorizing  the  quotient  rule,  I’ll  do  this  one 
from  first  principles,  using  the  product  rule. 


tan0 

d 

/  sin0\ 

“  dd 

\  cos  9  ) 

d 

r 

“  dd 

sin  9  (cos  9) 

=  cos9(cos9)  1  +  (sin0)(— l)(cos0)  2(—  sin0) 

=  1  +  tan2  6 

(Using  a  trig  identity,  this  can  also  be  rewritten  as  sec2  9.) 

page  49,  problem  22: 

Reexpressing  yfx  as  cc1/3,  the  derivative  is  (l/3)a;_2/3. 

page  49,  problem  23: 

(a)  Using  the  chain  rule,  the  derivative  of  ( x 2  +  l)1/2  is  (l/2)(a;2  + 
l)_1/2(2a:)  =  x(x2  +  l)-1/2. 

(b)  This  is  the  same  as  a,  except  that  the  1  is  replaced  with  an  a2,  so 
the  answer  is  x{x2  +  a2)-1/2.  The  idea  would  be  that  a  has  the  same 
units  as  x. 

(c)  This  can  be  rewritten  as  (a+a:)-1/2,  giving  a  derivative  of  (— l/2)(a+ 
x)~3/2. 

(d)  This  is  similar  to  c,  but  we  pick  up  a  factor  of  —2a:  from  the  chain 
rule,  making  the  result  aa;(a  —  a:2)-3/2. 
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page  49,  problem  24: 

By  the  chain  rule,  the  result  is  2/(2 1  +  1). 

page  49,  problem  25: 

Using  the  product  rule,  we  have 

(^jsinx  +  sQUin*), 

but  the  derivative  of  a  constant  is  zero,  so  the  first  term  goes  away,  and 
we  get  3  cos  a;,  which  is  what  we  would  have  had  just  from  the  usual 
method  of  treating  multiplicative  constants. 

page  49,  problem  26: 


N (Gamma(2) ) 

1 

N(Gamma(2. 00001)) 

1 . 00000^2278 

N(  (1.0000042278-l)/(. 00001)  ) 
0.4227799998 


Probably  only  the  first  few  digits  of  this  are  reliable. 

page  50,  problem  27: 

The  area  and  volume  are 


A  =  2nr£  +  2-7rr2 


and 


V  =  7r  r2£. 


The  strategy  is  to  use  the  equation  for  A ,  which  is  a  constant,  to  elimi¬ 
nate  the  variable  £,  and  then  maximize  V  in  terms  of  r. 

£  =  (A  —  27rr2)/27rr 

Substituting  this  expression  for  £  back  into  the  equation  for  V, 

1  , 

V  =  -rA  —  tt  ri . 

2 
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To  maximize  this  with  respect  to  r,  we  take  the  derivative  and  set  it 
equal  to  zero. 


0  =  ^ A  —  3nr2 
A  =  6nr2 

(.  =  (67t  r2  —  27rr2)/27rr 
£  =  2r 

In  other  words,  the  length  should  be  the  same  as  the  diameter. 

page  50,  problem  28: 

(a)  We  can  break  the  expression  down  into  three  factors:  the  constant 
m/2  in  front,  the  nonrelativistic  velocity  dependence  i>2,  and  the  rela¬ 
tivistic  correction  factor  (1  —  v2 / c2)~1^2 .  Rather  than  substituting  in  at 
for  v,  it’s  a  little  less  messy  to  calculate  dK/  d t  =  (d K/  dv){dv/  d t)  = 
adK/  dv.  Using  the  product  rule,  we  have 


(b)  The  expression  ma2t  is  the  nonrelativistic  (classical)  result,  and  has 
the  correct  units  of  kinetic  energy  divided  by  time.  The  factor  in  square 
brackets  is  the  relativistic  correction,  which  is  unitless. 

(c)  As  v  gets  closer  and  closer  to  c,  the  expression  1  —  v2/c2  approaches 
zero,  so  both  the  terms  in  the  relativistic  correction  blow  up  to  positive 
infinity. 

page  50,  problem  29: 

We  already  know  it  works  for  positive  x,  so  we  only  need  to  check  it 
for  negative  x.  For  negative  values  of  x,  the  chain  rule  tells  us  that  the 
derivative  is  l/|x|,  multiplied  by  —1,  since  d|x|/dx  =  —1.  This  gives 
—  l/|x|,  which  is  the  same  as  1/x,  since  x  is  assumed  negative. 

page  50,  problem  30: 

Since  f(x)  =  f(-x), 

d  f(x)  =  d  f(-x) 
dx  dx 
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But  by  the  chain  rule,  the  right-hand  side  equals  — /'(x),  as  claimed. 

page  50,  problem  32: 

Let  /  =  dxk  /  da;  be  the  unknown  function.  Then 


1  dx 
dx 

=  -( 
dx  ' 


xkx~k+1) 


=  fX~k+1  +Xk{-k+  1); 


—  k 


where  we  can  use  the  ordinary  rule  for  derivatives  of  powers  on  x  fc+1, 
since  —k  +  1  is  positive.  Solving  for  /,  we  have  the  desired  result. 

page  50,  problem  33:  Since  the  parallel  postulate  can  be  expressed 
in  terms  of  algebra  through  Cartesian  geometry,  the  transfer  principle 
tells  us  that  it  holds  for  F  as  well.  But  G  is  defined  in  terms  of  the 
finite  hyperreals,  so  statements  about  E  don’t  carry  over  to  statements 
about  G  simply  by  replacing  “real”  with  “hyperreal,”  and  the  transfer 
principle  does  not  guarantee  that  the  parallel  postulate  applies  to  G. 

In  fact,  it  is  easy  to  find  a  counterexample  in  G.  Let  e  be  an  infinitesimal 
number.  Consider  the  lines  with  equations  y  =  1  and  y  =  1+ex.  Neither 
of  these  intersects  the  x  axis. 

No,  it  is  not  valid  to  associate  only  E  with  the  plane  described  by  Eu¬ 
clid’s  axioms.  All  of  Euclid’s  axioms  hold  equally  well  in  F.  F  is  referred 
to  as  a  nonstandard  model  of  Euclid’s  axioms.  It  has  the  same  relation 
to  standard  Euclidean  geometry  as  the  hyperreals  have  to  the  reals.  If 
we  want  to  make  up  a  set  of  axioms  that  describes  E  and  can’t  describe 
F,  then  we  need  to  add  an  additional  axiom  to  Euclid’s  set.  An  exam¬ 
ple  of  such  an  axiom  would  be  an  axiom  stating  that  given  any  two  line 
segments  with  lengths  £\  and  £ 2 ,  there  exists  some  integer  n  such  that 
nii  >  £ 2 ■  Note  that  although  this  axiom  holds  in  E,  the  transfer  prin¬ 
ciple  cannot  be  used  to  show  that  it  holds  in  F  —  it  is  false  in  F.  The 
transfer  principle  doesn’t  apply  because  the  transfer  principle  doesn’t 
apply  to  statements  that  include  phrases  such  as  “for  any  integer.” 

page  51,  problem  34: 

The  normal  definition  of  a  repeating  decimal  such  as  0.999  ...  is  that  it 
is  the  limit  of  the  sequence  0.9,  0.99,  . . .,  and  the  limit  is  a  real  number, 
by  definition.  0.999  . . .  equals  1.  However,  there  is  an  intuition  that  the 
limiting  process  0.9,  0.99,  . . .  “never  quite  gets  there.”  This  intuition 
can,  in  fact,  be  formalized  in  the  construction  described  beginning  on 
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page  144;  we  can  define  a  hyperreal  number  based  on  the  sequence 
0.9,  0.99,  . . .,  and  it  is  a  number  infinitesimally  less  than  one.  This  is 
not,  however,  the  normal  way  of  defining  the  symbol  0.999  . . .,  and  we 
probably  wouldn’t  want  to  change  the  definition  so  that  it  was.  If  it 
was,  then  0.333 . . .  would  not  equal  1/3. 

page  51,  problem  35: 

Converting  these  into  Leibniz  notation,  we  find 

d/  =  dg 
dx  d  h 


and 


d/  =  do 

dx  d  h 

To  prove  something  is  not  true  in  general,  it  suffices  to  find  one  coun¬ 
terexample.  Suppose  that  g  and  h  are  both  unitless,  and  x  has  units 
of  seconds.  The  value  of  f  is  defined  by  the  output  of  g,  so  /  must 
also  be  unitless.  Since  /  is  unitless,  df  /  dx  has  units  of  inverse  sec¬ 
onds  (“per  second”).  But  this  doesn’t  match  the  units  of  either  of  the 
proposed  expressions,  because  they’re  both  unitless.  The  correct  chain 
rule,  however,  works.  In  the  equation 

df  dg  d  h 
dx  dh  dx  ’ 

the  right-hand  side  consists  of  a  unitless  factor  multiplied  by  a  factor 
with  units  of  inverse  seconds,  so  its  units  are  inverse  seconds,  matching 
the  left-hand  side. 

page  51,  problem  36: 

We  can  make  life  a  lot  easier  by  observing  that  the  function  s(f)  will 
be  maximized  when  the  expression  inside  the  square  root  is  minimized. 
Also,  since  /  is  squared  every  time  it  occurs,  we  can  change  to  a  variable 
x  =  /2,  and  then  once  the  optimal  value  of  x  is  found  we  can  take  its 
square  root  in  order  to  find  the  optimal  /.  The  function  to  be  optimized 
is  then 

a(x  -  fo)2  +  bx. 

Differentiating  this  and  setting  the  derivative  equal  to  zero,  we  find 

2a(x  ~fo)  +  b  =  0, 
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which  results  in  x  =  /2  —  b/2a,  or 

/  =  VP  -  b/2 a, 

(choosing  the  positive  root,  since  /  represents  a  frequencies,  and  fre¬ 
quencies  are  positive  by  definition).  Note  that  the  quantity  inside  the 
square  root  involves  the  square  of  a  frequency,  but  then  we  take  its 
square  root,  so  the  units  of  the  result  turn  out  to  be  frequency,  which 
makes  sense.  We  can  see  that  if  b  is  small,  the  second  term  is  small,  and 
the  maximum  occurs  very  nearly  at  f0. 

There  is  one  subtle  issue  that  was  glossed  over  above,  which  is  that 
the  graph  on  page  51  shows  two  extrema:  a  minimum  at  /  =  0  and  a 
maximum  at  /  >  0.  What  happened  to  the  /  =  0  minimum?  The  issue 
is  that  I  was  a  little  sloppy  with  the  change  of  variables.  Let  I  stand 
for  the  quantity  inside  the  square  root  in  the  original  expression  for  s. 
Then  by  the  chain  rule, 


ds  ds  d  I  dec 

d/  =  d7  dx  df 

We  looked  for  the  place  where  d I /  dx  was  zero,  but  ds/  df  could  also 
be  zero  if  one  of  the  other  factors  was  zero.  This  is  what  happens  at 
/  =  0,  where  dx/  df  =  0. 

page  51,  problem  37: 


y 


-i 


Q-tts)’ 

/(1“  l  +  iz/l) 


Applying  the  geometric  series  1/(1  +  r)  =  1  +  r  +  r2  +  . . ., 


P 


dx 


y 
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As  checks  on  our  result,  we  note  that  the  units  work  out  correctly  (me¬ 
ters  squared  divided  by  meters  give  meters),  and  that  the  result  is  indeed 
large,  since  we  divide  by  the  small  quantity  dec. 

page  52,  problem  38:  One  way  to  evaluate  an  expression  like  ab  is  by 
using  the  identity  ab  =  eblna.  If  we  try  to  substitute  a  =  1  and  b  =  oo, 
we  get  e°°  ,  which  has  an  indeterminate  form  inside  the  exponential. 
One  way  to  express  the  idea  is  that  if  there  is  even  the  tiniest  error  in 
the  value  of  a,  the  value  of  a°°  can  have  any  positive  value. 

Solutions  for  chapter  3 


page  68,  problem  1: 

(a)  The  Weierstrass  definition  requires  that  if  we’re  given  a  particular  e, 
and  we  be  able  to  find  a  S  so  small  that  f(x)  +  g(x)  differs  from  F  +  G 
by  at  most  e  for  \x  —  a\  <  5.  But  the  Weierstrass  definition  also  tells  us 
that  given  e/2,  we  can  find  a  6  such  that  /  differs  from  F  by  at  most 
e/2,  and  likewise  for  g  and  G.  The  amount  by  which  /  +  g  differs  from 
F  +  G  is  then  at  most  e/2  +  e/2,  which  completes  the  proof. 

(b)  Let  Ax  be  infinitesimal.  Then  the  definition  of  the  limit  in  terms  of 
infinitesimals  says  that  the  standard  part  of  /(a  +  da:)  differs  at  most 
infinitesimally  from  F,  and  likewise  for  g  and  G.  This  means  that  /  +  g 
differs  from  F  +  G  by  the  sum  of  two  infinitesimals,  which  is  itself  an 
infinitesimal,  and  therefore  the  standard  part  of  f+g  evaluated  at  x+dx 
equals  F  +  G,  satisfying  the  definition. 

page  68,  problem  2: 

The  shape  of  the  graph  can  be  found  by  considering  four  cases:  large 
negative  x,  small  negative  x,  small  positive  x,  and  large  positive  x.  In 
these  four  cases,  the  function  is  respectively  close  to  1,  large,  small,  and 
close  to  1. 


The  four  limits  correspond  to  the  four  cases  described  above. 

page  68,  problem  3:  All  five  of  these  can  be  done  using  l’Hopital’s 
rule: 
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X 


g  /  Problem  2. 


s-s-l  s  —  1 


lim 

0->O 


1  —  cos  9 
9~2 


sin0  cos0 

=  lim - =  lim - 

29  2 


1 

2 


5a;  —  2x 
x 

n(n 


=  lim 


=  oo 


lim 

x—>oo 

lull  -r  1)  ..  IT-2  +  •  •  • 

lim  - — - -  =  lim  „ - 

n->oo  (n  +  2)  (n  +  3)  n2  +  . . . 

ax2  +  bx  +  c  2 ax  +  . . . 

hm  — ^ - -  =  lim  — - 

®-roo  dxz  +  ex  +  f  2 ax  +  . . . 


-  lim 


2  n  . . . 

2  tl  -p  . . . 


=  lim  -  =  1 
2 


2  a  a 

=  lim  —  =  — 
2d  d 


In  examples  2,  4,  and  5,  we  differentiate  more  than  once  in  order  to 
get  an  expression  that  can  be  evaluated  by  substitution.  In  4  and  5, 
. . .  represents  terms  that  we  anticipate  will  go  away  after  the  second 
differentiation.  Most  people  probably  would  not  bother  with  l’Hopital’s 
rule  for  3,  4,  or  5,  being  content  merely  to  observe  the  behavior  of  the 
highest-order  term,  which  makes  the  limiting  behavior  obvious.  Exam¬ 
ples  3,  4,  and  5  can  also  be  done  rigorously  without  l’Hopit  rule,  by 
algebraic  manipulation;  we  divide  on  the  top  and  bottom  by  the  highest 
power  of  the  variable,  giving  an  expression  that  is  no  longer  an  indeter¬ 
minate  form  oo/oo. 

page  68,  problem  4: 

Both  numerator  and  denominator  go  to  zero,  so  we  can  apply  l’Hopital’s 
rule.  Differentiating  top  and  bottom  gives  (cos  x  —  a;sinx)/(—  In  2  •  2s), 
which  equals  —  \jln2  at  x  =  0.  To  check  this  numerically,  we  plug 
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x  =  10  3  into  the  original  expression.  The  result  is  —1.44219,  which  is 
very  close  to  —l/ln2  =  —1.44269. . .. 

page  68,  problem  5: 

L’Hopital’s  rule  only  works  when  both  the  numerator  and  the  denomi¬ 
nator  go  to  zero. 

page  68,  problem  6:  Applying  l’Hopital’s  rule  once  gives 

2  u 

iim  - , 

u— >o  eu  —  e  u 

which  is  still  an  indeterminate  form.  Applying  the  rule  a  second  time, 
we  get 

2 

lim  -  =  1. 

u— >o  eu  +  e  u 

As  a  numerical  check,  plugging  u  =  0.01  into  the  original  expression 
results  in  0.9999917. 

page  68,  problem  7:  L’Hopital’s  rule  gives  cost/1  — »  —1.  Plugging  in 
t  =  3.1  gives  -0.9997. 

page  68,  problem  8:  Let  u  =  1/x.  Then 

cl//  dx  df/du 
dg/ dx  dg/ d  u' 

simply  by  algebraic  manipulation  of  the  infinitesimals.  (If  we  want  to 
interpret  these  quantities  as  derivatives,  then  our  notational  convention 
is  that  they  stand  for  the  standard  parts  of  the  quotients  of  the  infinites¬ 
imals,  in  which  case  the  equality  is  only  for  the  standard  parts.)  This 
equality  holds  not  just  in  the  limit  but  everywhere  that  the  functions 
are  differentiable.  The  expression  on  the  left  is  the  thing  whose  limit 
we’re  trying  to  prove  equals  lim  f/g.  The  right-hand  side  is  equal  to 
lim  f  / g  by  the  previously  established  form  of  l’Hopital’s  rule. 

page  68,  problem  9:  By  the  definition  of  continuity  in  terms  of  in¬ 
finitesimals,  the  function  is  continuous,  because  an  infinitesimal  change 
dx  leads  to  a  change  dy  =  adx  in  the  output  of  the  function  which  is 
likewise  infinitesimals.  (This  depends  on  the  fact  that  a  is  assumed  to 
be  real,  which  implies  that  it  is  finite.) 

Continuity  in  terms  of  the  Weierstrass  limit  holds  because  we  can  take 
5  =  e/a. 
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Solutions  for  chapter  4 


page  83,  problem  1: 


a  :=  0; 
b  :=  1; 

H  :=  1000; 
dt  :=  (b-a)/H; 
sum  : =  0 ; 
t  :  =  a; 

While  (t<=b)  [ 

sum  :=  N(sum+Exp (x~2) *dt) ; 
t  : =  N (t+dt) ; 

]; 

Echo (sum) ; 


The  result  is  1.46. 


y 


h  /  Problem  2. 


page  83,  problem  2: 

The  derivative  of  the  cosine  is  minus  the  sine,  so  to  get  a  function  whose 
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derivative  is  the  sine,  we  need  minus  the  cosine. 


sin  x  da: 


=  (—  cos  a;)  |  q71' 

=  (—  cos  27 r)  —  (—  cos  0) 


=  (-!)-(-!) 
=  0 


As  shown  in  figure  h,  the  graph  has  equal  amounts  of  area  above  and 
below  the  x  axis.  The  area  below  the  axis  counts  as  negative  area,  so 
the  total  is  zero. 

page  83,  problem  3: 


i  /  Problem  3. 

The  rectangular  area  of  the  graph  is  2,  and  the  area  under  the  curve 
fills  a  little  more  than  half  of  that,  so  let’s  guess  1.4. 


r  +x‘ 


(-8/3 

4/3 


'  4)  —  (0) 


This  is  roughly  what  we  were  expecting  from  our  visual  estimate. 
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page  83,  problem  4: 

Over  this  interval,  the  value  of  the  sin  function  varies  from  0  to  1,  and 
it  spends  more  time  above  1/2  than  below  it,  so  we  expect  the  average 
to  be  somewhat  greater  than  1/2.  The  exact  result  is 

sin  =  - —  [  sin  x  dx 

n-0  J0 

=  -  (  cos  x)  |q 
7 r 

=  —  [—  COS  7T  —  ( —  COS  0)] 

7 r 

_  2 
7 r 


which  is,  as  expected,  somewhat  more  than  1/2. 

page  83,  problem  5: 

Consider  a  function  y(x)  defined  on  the  interval  from  x  =  0  to  2  like 
this: 

f  — 1  if  0  <  x  <  1 
ylx)  =  < 

yy  J  [1  if  1  <  x  <  2 

The  mean  value  of  y  is  zero,  but  y  never  equals  zero. 


page  83,  problem  6: 

Let  x  be  defined  as 


±{t) 


0  if  t  <  0 
1  if  t  >  0 


Integrating  this  function  up  to  t  gives 


x[t) 


0  if  t  <  0 
t  if  t  >  0 


The  derivative  of  x  at  t  =  0  is  undefined,  and  therefore  integration 
followed  by  differentiation  doesn’t  recover  the  original  function  x. 

page  83,  problem  8:  First  we  put  the  integrand  into  the  more  familiar 
and  convenient  form  cxp,  whose  integral  is  ( c/(p+  \))xp+1 .  \Jbxyfx  = 
b1/'2x3/4.  Applying  the  general  rule,  the  result  is  (/t/7)b1^2x7^4:. 

page  84,  problem  11:  The  claim  is  false  for  indefinite  integrals,  since 
indefinite  integrals  can  have  a  constant  of  integration.  So,  for  example, 
a  possible  indefinite  integral  of  x2  is  x3/3  +  7,  which  is  neither  even  nor 
odd.  The  fundamental  theorem  doesn’t  even  refer  to  indefinite  integrals, 
which  are  simply  defined  through  inverse  differentiation. 
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Let’s  fix  the  claim  by  changing  g  to  a  definite  integral,  g(x)  = 
fo  f(u)  dii.  The  claim  is  now  true.  However,  the  proof  still  doesn’t 
quite  work.  We’ve  established  that  all  odd  functions  have  even  deriva¬ 
tives,  but  we  haven’t  ruled  out  possibilities  such  as  functions  that  are 
neither  even  nor  odd,  but  that  have  even  derivatives. 

Solutions  for  chapter  5 

page  99,  problem  16: 

It’s  pretty  trivial  to  generalize  from  e  to  b.  If  we  write  bx  as  exlnb ,  then 
we  can  substitute  u  =  xlnb  and  reduce  the  b  ^  e  case  to  b  =  e. 

The  generalization  of  the  exponent  of  x  from  2  to  a  is  less  straightfor¬ 
ward.  To  do  it  with  a  =  2,  we  needed  two  integrations  by  parts,  so 
clearly  if  we  wanted  to  do  a  case  with  a  =  37,  we  could  do  it  with  37 
integrations  by  parts.  However,  we  would  have  no  easy  way  to  write 
down  the  complete  answer  without  going  through  the  whole  tedious 
calculation.  Furthermore,  this  is  only  going  to  work  if  a  is  a  positive 
integer. 

page  99,  problem  18:  The  obvious  substitution  is  u  =  xp,  which  leads 
to  the  form  j  euu1^p^1  du.  If  the  exponent  1/p  —  1  equals  a  nonnegative 
integer  n,  then  through  n  integrations  by  parts,  we  can  reduce  this  to 
the  form  J  e^da;.  This  requires  p  =  1,  1/2,  1/3,  . . . 

page  99,  problem  19:  This  is  a  mess  if  attacked  by  brute  force.  The 
trick  is  to  reexpress  the  function  using  partial  fractions: 

x2  +  1  x2  +  1  x2  +  1  x2  +  1 
x3  —  x  2(x  +  1)  2(a;  —  1)  x 

Writing  u  =  x  +  1  and  v  =  x  —  1,  this  becomes 

it-1  +  v-1  —  x +  .  .  .  , 

where  . . .  represents  terms  that  will  not  survive  multiple  differentiations. 
Since  du/  da:  =  dv/  da;  =  1,  the  chain  rule  tells  us  that  differentiation 
with  respect  to  u  or  v  is  the  same  as  differentiation  with  respect  to  x. 
The  result  is  100!(u-101  +i>-101  —a:-101),  where  the  notation  100!  means 
1  x  2  x  . . .  100. 

Solutions  for  chapter  6 


page  104,  problem  4: 
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The  method  of  finding  the  indefinite  integral  is  discussed  in  ex¬ 
ample  70  on  p.  91  and  problem  16  on  p.  99.  The  result  is 
—  (ln2)_3e_M  (— u2  —  2u  +  2),  where  u  =  —a:  In  2.  Plugging  in  the  limits 
of  integration,  we  obtain  2(ln2)-3. 

Solutions  for  chapter  7 

page  114,  problem  1: 

We  can  define  the  sequence  f(n)  as  converging  to  £  if  the  following  is 
true:  for  any  real  number  e,  there  exists  an  integer  N  such  that  for  all  n 
greater  than  N,  the  value  of  /  lies  within  the  range  from  £  —  e  to  £  +  e. 

page  114,  problem  2: 

(a)  The  convergence  of  the  series  is  defined  in  terms  of  the  convergence 
of  its  partial  sums,  which  are  1,  0,  1,  0,  . .  .In  the  notation  used  in  the 
definition  given  in  the  solution  to  problem  1  above,  suppose  we  pick 
e  =  1/4.  Then  there  is  clearly  no  way  to  choose  any  numbers  £  and  N 
that  would  satisfy  the  definition,  for  regardless  of  N,  t  would  have  to 
be  both  greater  than  3/4  and  less  than  1/4  in  order  to  agree  with  the 
zeroes  and  ones  that  occur  beyond  the  IVth  member  of  the  sequence. 

(b)  As  remarked  on  page  106,  the  axioms  of  the  real  number  system, 
such  as  associativity,  only  deal  with  finite  sums,  not  infinite  ones.  To  see 
that  absurd  conclusions  result  from  attempting  to  apply  them  to  infinite 
sums,  consider  that  by  the  same  type  of  argument  we  could  group  the 
sum  as  1  +  (—1  +  1)  +  (—1  +  1)  +  . . .,  which  would  equal  1. 

page  114,  problem  3: 

The  quantity  xn  can  be  reexpressed  as  enlnx,  where  In  a:  is  negative 
by  hypothesis.  The  integral  of  this  exponential  with  respect  to  n  is  a 
similar  exponential  with  a  constant  factor  in  front,  and  this  converges 
as  n  approaches  infinity. 

page  114,  problem  4: 

(a)  Applying  the  integral  test,  we  find  that  the  integral  of  1/a;2  is  —1/a:, 
which  converges  as  x  approaches  infinity,  so  the  series  converges  as  well. 

(b)  This  is  an  alternating  series  whose  terms  approach  zero,  so  it  con¬ 
verges.  However,  the  terms  get  small  extremely  slowly,  so  an  extraor¬ 
dinarily  large  number  of  terms  would  be  required  in  order  to  get  any 
kind  of  decent  approximation  to  the  sum.  In  fact,  it  is  impossible  to 
carry  out  a  straightforward  numerical  evaluation  of  this  sum  because 
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it  would  require  such  an  enormous  number  of  terms  that  the  rounding 
errors  would  overwhelm  the  result. 

(c)  This  converges  by  the  ratio  test,  because  the  ratio  of  successive  terms 
approaches  0. 

(d)  Split  the  sum  into  two  sums,  one  for  the  1103  term  and  one  for 
the  26390fc.  The  ratio  of  the  two  factorials  is  always  less  than  44fe, 
so  discarding  constant  factors,  the  first  sum  is  less  than  a  geometric 
series  with  x  =  (4/396)4  <  1,  and  must  therefore  converge.  The  second 
sum  is  less  than  a  series  of  the  form  kxk .  This  one  also  converges,  by 
the  integral  test.  (It  has  to  be  integrated  with  respect  to  k,  not  x, 
and  the  integration  can  be  done  by  parts.)  Since  both  separate  sums 
converge,  the  entire  sum  converges.  This  bizarre-looking  expression  was 
formulated  and  shown  to  equal  1/tt  by  the  self-taught  genius  Srinivasa 
Ramanujan  (1887-1920). 

page  114,  problem  5:  E.g.,  ^(j<Los^nn  diverges,  but  the  ratio  test 
won’t  establish  that,  because  the  limit  lim.n^oo  |  sin(n+  l)/sin(n)|  does 
not  exist. 

page  116,  problem  14:  The  nth  term  an  can  be  rewritten  as  2/[n(n  + 
1)],  and  using  partial  fractions  this  can  be  changed  into  2/n  — 2/(n  +  l). 
Let  the  partial  sums  be  sn  —  an-  For  insight,  let’s  write  out  S3: 


This  is  called  a  telescoping  series.  The  second  part  of  one  term  cancels 
out  with  the  first  part  of  the  next.  Therefore  we  have 

2  2 
53  “  I  4’ 

and  in  general 

2  2 

Sn  ~  1  “  n  +  1  ’ 

Letting  n  — >  00,  we  find  that  the  series  sums  to  2. 

page  116,  problem  17:  Yes,  it  converges.  To  see  this,  consider  that  its 
graph  consists  of  a  series  of  peaks  and  valleys,  each  of  which  is  narrower 
than  the  last  and  therefore  has  less  area.  In  fact,  the  width  of  these 
humps  approaches  zero,  so  that  the  area  approaches  zero.  This  means 
that  the  integral  can  be  represented  as  a  decreasing,  alternating  series 
that  approaches  zero,  which  must  converge. 
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page  115,  problem  13:  There  are  certainly  some  special  values  of  x 
for  which  it  does  converge,  such  as  0  and  tt.  For  a  general  value  of  x, 
however,  things  become  more  complicated.  Let  the  nth  term  be  given 
by  the  function  t(n).  |t|  converges  to  a  limit,  since  the  first  application 
of  the  sine  function  brings  us  into  the  range  0  <  |f|  <  1,  and  from 
then  on,  |t|  is  decreasing  and  bounded  below  by  0.  It  can’t  approach  a 
nonzero  limit,  for  given  such  a  limit  t* ,  there  would  always  be  values  of 
t  slightly  greater  than  t*  such  that  sin  t  was  less  than  t* .  Therefore  the 
terms  in  the  sum  approach  zero.  This  is  necessary  but  not  sufficient  for 
the  series  to  converge. 

Once  t  gets  small  enough,  we  can  approximate  the  sine  using  a  Taylor 
series.  Approximating  the  discrete  function  f  by  a  continuous  one,  we 
have  d t/  d n  ~  —  (l/6)f3,  which  can  be  rewritten  as  tr3  df  «  —(1/6)  dn. 
This  is  known  as  separation  of  variables.  Integrating,  we  find  that  at 
large  values  of  n,  where  the  constant  of  integration  becomes  negligible, 
t  w  ±\/3/n.  The  sum  diverges  by  the  integral  test.  Therefore  the  sum 
diverges  for  all  values  of  x  except  for  multiples  of  7r,  which  cause  t  to  hit 
zero  immediately  without  passing  through  the  region  where  the  Taylor 
series  is  a  good  approximation. 

page  117,  problem  20:  Our  first  impression  is  that  it  must  converge, 
since  the  2~n  factor  shrinks  much  more  rapidly  than  the  n2  factor. 
To  prove  this  rigorously,  we  can  apply  the  integral  test.  The  relevant 
improper  integral  was  carried  out  in  problem  4  on  p.  104. 

Finding  the  sum  is  far  more  difficult,  and  there  is  no  obvious  technique 
that  is  guaranteed  to  work.  However,  the  integral  test  suggests  an  ap¬ 
proach  that  does  lead  to  a  solution.  The  fact  that  the  indefinite  integral 
can  be  evaluated  suggests  that  perhaps  the  partial  sum 

n 

sn  = 

j=0 

can  also  be  evaluated.  Furthermore,  the  fact  that  the  integral  was  of 
the  form  2 ~xP(x),  for  some  polynomial  x,  suggests  that  perhaps  Sn  is 
of  the  same  form.  Based  on  this  conjecture,  we  try  to  determine  the 
unknown  coefficients  in  P(n)  =  an2  +  bn  +  c. 

Sn  -  Sn _!  =  U2 2~n 

n2 2~n  =  2~n  [—an2  +  (4a  —  b)n  —  2a  +  2b  —  c\ 

Solving  for  a,  b,  and  c  results  in  P(n)  =  — n2  —  4 n  —  6.  This  gives  the 
correct  value  for  the  difference  Sn  —  Sn_ i,  but  doesn’t  give  Sn  =  0  as 
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it  should.  But  this  is  easy  to  fix  simply  by  changing  the  form  of  our 
conjectured  partial  sum  slightly  to  Sn  =  2 ~nP(n)  +  k,  where  k  =  6. 
Evaluating  limn^oo  Sn .  we  get  6. 

page  117,  problem  21:  The  function  cos2  averages  to  1/2,  so  we 
might  naively  expect  that  cosn  would  average  to  about  2_n/2,  in  which 
case  the  sum  would  converge  for  any  value  of  p  whatsoever.  But  the 
average  is  misleading,  because  there  are  some  “lucky”  values  of  n  for 
which  cos2  n  «  1,  and  these  will  have  a  disproportionate  effect  on  the 
sum.  We  know  by  the  integral  test  that  ]Fl/n  diverges,  but  ^1/ n 2 
converges,  so  clearly  if  p  >  2,  then  even  these  occasional  “lucky”  terms 
will  not  cause  divergence. 

What  about  p  =  1?  Suppose  we  have  some  value  of  n  for  which  cos2  n  = 
1  —  e,  where  e  is  some  small  number.  If  this  is  to  happen,  then  we 
must  have  n  =  kn  +  S,  where  k  is  an  integer  and  <5  is  small,  so  that 
cos2  n  «  1  —  <52,  i.e.,  e  «  S2.  This  occurs  with  a  probability  proportional 
to  S,  and  the  resulting  contribution  to  the  sum  is  about  (1  —  S2)n/n, 
which  by  the  binomial  theorem  is  roughly  of  order  of  1/n  if  nS2  ~  1. 
This  happens  with  probability  ~  n~1/2,  so  the  expected  value  of  the 
nth  term  is  ~  nr3/2.  Since  ^  n~3/2  converges  by  the  integral  test,  this 
suggests,  but  does  not  prove  rigorously,  that  we  also  get  convergence  for 
p=l. 

A  similar  argument  suggests  that  the  sum  diverges  for  p  =  0. 

Answers  to  self-checks  for  chapter  9 


page  126,  problem  9:  First  we  rewrite  the  integrand  as 


1 

4 


eix  +  e~ix)  (e2ix  +  2~2ix) 


-  (e3ix  +  e~3ix  +  eix  +  e~ix ) 


The  indefinite  integral  is 


1 

12 i 


03  ix 


—Six 


)  +  T7  (e™  ~  e~iX) 
’  At  v  ’ 


Evaluating  this  at  0  gives  0,  while  at  7r/2  we  find  1/3.  The  result  is  1/3. 
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B  Answers  and  solutions 


page  126,  problem  8: 

sin  (a  +  b)  =  (ei(a+b)  -  e"i(a+b) )  /2  % 

=  (eiaeib  -  e~iae~ib)  /2 % 

=  [(cos  a  +  i  sin  a)  (cos  b  +  i  sin  b)  —  (cos  a  —  i  sin  a)  (cos  b  —  i  sin  b)\ /2i 
=  [(cos  a  +  i  sin  a)  (cos  b  +  i  sin  b)  —  (cos  a  —  i  sin  a)  (cos  b  —  i  sin  b)\ /2i 
=  cos  a  sin  b  +  sin  a  cos  b 

By  a  similar  computation,  we  find  cos  (a  +  b)  =  cos  a  cos  b  —  sin  a  sin  b. 

page  126,  problem  10:  If  z3  =  1,  then  we  know  that  |z|  =  1,  since 
cubing  z  cubes  its  magnitude.  Cubing  z  triples  its  argument,  so  the 
argument  of  z  must  be  a  number  that,  when  tripled,  is  equivalent  to  an 
angle  of  zero.  There  are  three  possibilities:  0x3  =  0,  (2n/3)  x  3  =  27r, 
and  (47t/3)x3  =  47r.  (Other  possibilities,  such  as  (327t/3),  are  equivalent 
to  one  of  these.)  The  solutions  are: 

z  =  1,  e27ri/3,  e4"/3 

page  126,  problem  11:  We  can  think  of  this  as  a  polynomial  in  a:  or  a 
polynomial  in  y  —  their  roles  are  symmetric.  Let’s  call  x  the  variable. 

By  the  fundamental  theorem  of  algebra,  it  must  be  possible  to  factor  it 
into  a  product  of  three  linear  factors,  if  the  coefficients  are  allowed  to 
be  complex.  Each  of  these  factors  causes  the  product  to  be  zero  for  a 
certain  value  of  x.  But  the  condition  for  the  expression  to  be  zero  is 
x3  =  y3,  which  basically  means  that  the  ratio  of  x  to  y  must  be  a  third 
root  of  1.  The  problem,  then,  boils  down  to  finding  the  three  third  roots 
of  1,  as  in  problem  10.  Using  the  result  of  that  problem,  we  find  that 
there  are  zeroes  when  x/y  equals  1,  e2™/3 ,  and  e4™/3.  This  tells  us  that 
the  factorization  is  {x  —  y)(x  —  e2m!3y)(x  —  e4nl^3y). 

The  second  part  of  the  problem  asks  us  to  factorize  as  much  as  possible 
using  real  coefficients.  Our  only  hope  of  doing  this  is  to  multiply  out 
the  two  factors  that  involve  complex  coefficients,  and  see  if  they  produce 
something  real.  In  fact,  we  can  anticipate  that  it  will  work,  because  the 
coefficients  are  complex  conjugates  of  one  another,  and  when  a  quadratic 
has  two  complex  roots,  they  are  conjugates.  The  result  is  (x  —  y){x2  + 
xy  +  y2). 

page  126,  problem  14:  Applying  the  differential  equation  to  the  form 
suggested  gives  abxb~l  =  ab+1xb  .  The  exponents  must  be  equal  on 
both  sides,  so  b  must  be  a  solution  of  b2  —  b  +  1.  The  solutions  are 
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b  =  (1  ±  v/3i)/2.  For  a  more  detailed  discussion  of  this  cute  problem, 

see  mathoverf low . net /quest ions/ 1 11066. 

page  127,  problem  15:  (a)  Let  m  =  10,000.  We  know  that  integrals 
of  this  form  can  be  done,  at  least  in  theory,  using  partial  fractions. 
The  ten  thousand  roots  of  the  polynomial  will  be  ten  thousand  points 
evenly  spaced  around  the  unit  circle  in  the  complex  plane.  They  can 
be  expressed  as  r =  e27rfc/,m  for  k  =  0  to  m  —  1.  Since  all  the  roots 
are  unequal,  the  partial-fraction  form  of  the  integrand  contains  only 
terms  of  the  form  Ak/{x  —  r*).  Integrating,  we  would  get  a  sum  of  ten 
thousand  terms  of  the  form  A k  ln(cc  —  r^). 

(b)  I  tried  inputting  the  integral  into  three  different  pieces  of  symbolic 
math  software:  the  open-source  packages  Yacas  and  Maxima,  and  the 
web-based  interface  to  Wolfram’s  proprietary  Matlrematica  software  at 
integrals.com.  Maxima  gave  a  partially  integrated  result  after  a  couple 
of  minutes  of  computation.  Yacas  crashed.  Mathematical  web  interface 
timed  out  and  suggested  buying  a  stand-alone  copy  of  Matlrematica.  All 
three  programs  probably  embarked  on  the  computation  of  the  Aj.  by 
attempting  to  solve  10,000  equations  in  the  10,000  unknowns  Ak,  and 
then  ran  out  of  resources  (either  memory  or  CPU  time). 

(c)  The  expressions  look  nicer  if  we  let  ui  =  e27r/m,  so  that  rk  =  ujk.  The 
residue  method  gives 

1  =y _ 1 _ . 

xm  —  1  '  ( x  —  wfe)?nwfclm_1l 

Integration  gives 

/  — t  =  E  J  nln(s-o,fc). 

(Thanks  to  math.stackexchage.com  user  zulon  for  suggesting  the 
residue  mathod,  and  to  Robert  Israel  for  pointing  out  that  for 
|  a;  |  <  1  this  can  also  be  expressed  as  a  hypergeometric  function: 
(-*)  2*1 
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D  References  and 
Further  Reading 


Further  Reading 

The  amount  of  high-quality  material  on  elementary  calculus  available 
for  free  online  these  days  is  an  embarrassment  of  riches,  so  most  of  my 
suggestions  for  reading  are  online.  I’ll  refer  to  books  in  this  section  only 
by  the  surname  of  the  first  author;  the  references  section  below  tells  you 
where  to  find  the  book  online  or  in  print. 

The  reader  who  wants  to  learn  more  about  the  hyperreal  system  might 
want  to  start  with  Stroyan  and  the  Mathforum.org  article.  For  more 
depth,  one  could  next  read  the  relevant  parts  of  Keisler.  The  standard 
(difficult)  treatise  on  the  subject  is  Robinson. 

Given  sufficient  ingenuity,  it’s  possible  to  develop  a  surprisingly  large 
amount  of  the  machinery  of  calculus  without  using  limits  or  infinitesi¬ 
mals.  Two  examples  of  such  treatments  that  are  freely  available  online 
are  Marsden  and  Livshits.  Marsden  gives  a  geometrical  definition  of  the 
derivative  similar  to  the  one  used  in  ch.  1  of  this  book,  but  in  my  opin¬ 
ion  his  efforts  to  develop  a  sufficient  body  of  techniques  without  limits 
or  infinitesimals  end  up  bogging  down  in  complicated  formulations  that 
have  the  same  flavor  as  the  Weierstrass  definition  of  the  limit  and  are 
just  as  complicated.  Livshits  treats  differentiation  of  rational  functions 
as  division  of  functions. 

Tall  gives  an  interesting  construction  of  a  number  system  that  is  smaller 
than  the  hyperreals,  but  easier  to  construct  explicitly,  and  sufficient  to 
handle  calculus  involving  analytic  functions. 
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E  Reference 


E.1  Review 

Algebra 


Quadratic  equation: 


The  solutions  of  ax2  +  bx  +  c  =  0 


—  b±-^62—  4ac 


Logarithms  and  exponentials: 


Trigonometry  with  a  right 
triangle 


o  =  opposite 
side 

a  =  adjacent  side 

sin  6  =  o/h  cos  9  =  a/h  tan#  =  o/a 
Pythagorean  theorem:  h2  =  a2  +  o 2 


In (ab)  =  In  a  +  In  b 

a-\-b  a  b 

e  =  e  e 

i  x  In  a? 

me  =  e  =  x 
ln(ab)  =  6  In  a 


Trigonometry  with  any  triangle 


sin  a  sin  /3  sin  7 
A  =  ~B~  =  ~C~ 


Geometry,  area,  and  volume 


area  of  a  triangle  of 
base  b  and  height  h 

=  \bh 

circumference  of  a 
circle  of  radius  r 

=  2n  r 

area  of  a  circle  of  ra¬ 
dius  r 

—  nr2 

surface  area  of  a 
sphere  of  radius  r 

=  4-7T  r2 

volume  of  a  sphere  of 
radius  r 

CO 

rflco 

II 

Law  of  Cosines: 

C2  =  A2  +  B2  —  2AB  cos  7 

E.2  Hyperbolic 
functions 


I  „  —  x 

,  e  +  e 
coshx  =  - - - 


.  smh  x 

tanhx  =  - - — 

coshx 
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E  Reference 


E.3  Calculus 


Table  of  integrals 


Let  /  and  g  be  functions  of  x,  and  let 
c  be  a  constant. 

Linearity  of  the  derivative: 


d  .  df  d  g 

d^(/  +  S)-d^  +  d^ 


Rules  for  differentiation 


xm  dx  =  -^—xm+1  +  c,  m±- 1 
m  +  1 

’  dx 

—  =  In  a;  +  c 
x 

sin  xdx  =  —  cos  x  +  c 


cos  xdx  =  sin  x  +  c 


The  chain  rule: 


^;/(3  (*))  =  f'{g(x))g'{x) 
Derivatives  of  products  and  quo- 


X  -i  X  | 

e  da:  =  e  +  c 


In  x  da;  =  x  In  x  —  x  +  c 


da:  _j 

- -  =  tan  x  +  c 

1  -(-  ar 

da:  .  _! 

,  =  sin  x  T  c 

^1-a;2 


cosh  a:  da:  =  sinh  a:  -(-  c 


dx  \gj  g  g2 


sinh  xdx  =  cosh  x  +  c 


tan  x  da:  =  —  In  I  cos  a:  I  +  c 


cot  xdx  =  In  I  sin x\  +  c 


Integral  calculus 


sec  *  da:  =  In  I  sec  x  +  tan  x\  +  c 


The  fundamental  theorem  of  calculus:  J  sec  xdx  —  tan x  +  c 


da ■  =  f 


esc  xdx  =  —  cot x  +  c 


Linearity  of  the  integral: 


J  cf(x)  dx  =  c  J  f(x)dx 
lf(x)  +  g{x)\=  J  f(x)dx+J  g(x)dx 


Integration  by  parts: 


fdg  =  fg-  /  gdf 
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Apery’s  constant,  116 
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average,  76 

Basel  problem,  116 
Berkeley,  George,  30 
boundary  point,  159 

calculus 
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fundamental  theorem  of 
proof,  154 
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integral,  13 

Cartesian  coordinates,  133 
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change  of  variables,  87 
chromatic  scale,  115 
compact  set,  159 
completeness,  157 
complex  number,  119 
argument  of,  120 
conjugate  of,  120 
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composition,  53 
concavity,  16 
conjugate,  120 
continuous  function,  53 
coordinates 

Cartesian,  133 
cylindrical,  135 
polar,  133 
spherical,  135 
cosine 

derivative  of,  29 
cylindrical  coordinates,  135 


derivative 

chain  rule,  37 

defined  using  a  limit,  31,  46,  59 
defined  using  infinitesimals,  34 
definition  using  tangent  line,  13 
of  a  polynomial,  14,  140 
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of  a  second-order  polynomial,  14 
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of  the  exponential,  39,  151 
of  the  logarithm,  40 
of  the  sine,  28,  141 
product  rule,  35 
properties  of,  14 
second,  15 
undefined,  18 
Descartes,  Rene,  133 
differentiation 

computer-aided,  43 
numerical,  45 
symbolic,  43 
implicit,  86 

errors 

propagation  of,  19 
Euclid,  105 
Euler,  116 
Euler’s  formula,  122 
Euler,  Leonhard,  123 
exponential 

definition  of,  151 
derivative  of,  39 
extreme  value  theorem,  56 
proof,  159 

extremum  of  a  function,  17 

factorial,  9,  110 
fission,  137 

fundamental  theorem  of  algebra 
proof,  162 
statement,  122 
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fundamental  theorem  of  calculus 
proof,  154 
statement,  74 

Galileo,  11 

Gauss,  Carl  Friedrich,  7 
portrait,  7 

geometric  series,  29,  105 
halo,  33 

Holditch’s  theorem,  84 
hyperbolic  cosine,  48 
hyperbolic  tangent,  49 
hyperinteger,  150 
hyperreal  number,  31 

imaginary  number,  119 
implicit  differentiation,  86 
improper  integral,  101 
indeterminate  form,  63 
Inf  (calculator),  27 
infinitesimal  number,  25 
criticism  of,  30 
safe  use  of,  30 
infinity,  25 
inflection  point,  17 
integral,  13 
definite 

definition,  74 
improper,  101 
indefinite 

definition,  73 
iterated,  129 
properties  of,  75 
integral  test,  107 
integration 

computer-aided 
numerical,  73 
symbolic,  44 
methods  of 
by  parts,  89 
change  of  variable,  87 
partial  fractions,  91,  124 
substitution,  87 

intermediate  value  theorem,  54,  156 
iterated  integral,  129 


Kepler,  Johannes,  85 

l’Hopital’s  rule 

general  form,  65 
proofs,  152 
simplest  form,  61 
Leibniz  notation 
derivative,  26 
infinitesimal,  26 
integral,  73 
Leibniz,  Gottfried,  25 
limit,  31 

definition 

infinitesimals,  58 
Weierstrass,  58 
liquid  drop  model,  137 
logarithm 

definition  of,  40 

magnitude  of  a  complex  number,  120 
maximum  of  a  function,  17 
mean  value  theorem 
proof,  161 
statement,  76 
minimum  of  a  function,  17 
model,  145 

moment  of  inertia,  131 

Newton’s  method,  85 
Newton,  Isaac,  10 
normalization,  77 
nucleus,  137 

partial  fractions,  91,  124 
residue  method,  94 
periodic  function,  178 
planets,  motion  of,  85 
polar  coordinates,  133 
probability,  77 
product  rule,  35 
propagation  of  errors,  19 

quantifier,  143 
quotient 

derivative  of,  42 

radius  of  convergence,  111 
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ratio  test,  107 
residue  method,  94 
Robinson,  Abraham,  31 
Rolle’s  theorem,  76 

sequence,  105 
series 

geometric,  29,  105 
infinite,  105 
Taylor,  108 
telescoping,  193 
series,  infinite,  109 
sine 

derivative  of,  28 
Sophomore’s  dream,  115 
spherical  coordinates,  135 
standard  deviation,  81 
standard  part,  34 
substitution,  87 
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